Separator problem when importing a textfile

When importing a textfile, the custom separator works fine for all but the first column.
I don’t see any difference in the data.
Testfile (just the headline) and project enclosed.

Please have a look at it.

Regards,
Michael20171101_Dlg_PROV TEST.TXT (233 Bytes)
HLP_CSV_Dialog_Öffnen.morph (2.6 KB)

Hi Michael,

I can’t see the enclosed files in your post. Can you please try to upload them one more time
and also send them to andrew.rybka@easymorph.com?

Michel,

That happens because the double quotes around the first values are treated as an opening and closing escapers.
So the first two values are parsed as 1) correctly quoted value Vermittler; 2) comma; 3) incorrectly placed double quote (which is not treated as an escaper); 4) incorrectly placed value Buchdatum. Since double quote after Vermittler is parsed as a closing escaper, it’s not recognized as a part of a separator and all the four mentioned items are imported as a single value.

It seems that you are forced to use custom separator because EasyMorph doesn’t allow you to choose comma as text separator and decimal separator in the same transformation (please correct me if I’m wrong). It’s possible that this behavior should be changed somehow. We have to consider our options.

Unfortunately, I can’t think of any simple and universal solution for this problem. I have two different workarounds for you. But I depend on your actual data if any of them can be used.

I assumed that all the values in the source file are wrapped in double quotes and numeric values have comma as decimal separator.

Here is an example project for all the three workarounds: example.zip (9.5 KB)

Workaround #1:

  • Import with a custom separator ," (comma and doublequote).
  • Remove all the trailing double quotes. You can use Table-wide replace transformation (Workaround #1.1) or, if your source data contains some double quotes which should be imported as part of text values, you can use Modify column transformation (Workaround #1.2) with removeend() function. I also added if(endswith([Col2"] …) expression just in case.
  • Convert values in each numeric column from text to numbers with Split delimited text into columns transformation (Separator: No separator, Decimal separator: Comma).

This workaround can’t be used if source file contains text values with “embedded” double quotes, line breaks or trailing commas.

Workaround #2:

  • Make copy of your source file;
  • In any text editor replace “,” (double quote + comma + double quote) with “;” (double quote + semicolon + double quote);
  • Import as usual with Separator: Semicolon and Decimal: Comma.

This workaround can’t be used when text values in the source file contain “;” which should be imported as is.

Workaround #3:

Similar to the #2, but the replacement is executed inside EasyMorph.

  • Import original source file as fixed width text with a single column of 32767 width (maximum supported text value width). Not that “First line contains headers” checkbox should be turned off;
  • With Table-wide replace transformation replace “,” (double quote + comma + double quote) with “;” (double quote + semicolon + double quote);
  • Use Split delimited text into columns transformation with Separator: Semicolon and Decimal separator: Comma. You’ll have to manually add as many columns as your source data has. But there is no need to declare names of the added columns;
  • Use label columns transformation in order to rename columns according to actual headers in the source file.

This workaround can’t be used when:

  • Text values in the source file contain “;” which should be imported as is;
  • Some of the lines in the source file are longer than 32767 characters;
  • Some of the text values in the source file contains line breaks.

Hello Andrew,
thanks for your quick reply.

Your assumption is correct: we would need comma as text separator AND decimal separator (but that would make it hard to separate, I’m not sure if you should follow that way …).
All values are placed between double quotes, separated by comma - therefore it’s unique.

My solution was:

  1. Import with custom separator double quotes + comma + double quotes
  2. Split the column that isn’t recognized correctly using split delimited text into columns with custom separator comma + double quotes
  3. replace double quotes in the last column.

Regards,
Michael