Remove Line Breaks from Data

How can I read in a csv file and ignore embedded LF in data. It is already in double quotes, so it should be ignored. the CRLF at the end is fine.

rfc4180 states that CRLF can be in double quotes.

http://www.rfc-editor.org/rfc/rfc4180.txt

Red

1 Like

Hi Red,

At this point it’s not possible. It’s a known issue that fill be fixed eventually, but for now CRLF in double quotes is not supported.

Is it possible to do a table-wide replace and search for Char(20) and replace with Char(32)?

Yes, it is possible. I just tried it using “Table-wide replace” transformation and it worked.

Great,

Follow up: How do I specify the "Search for " input?
How do I search for Char(20) and not “Char(20)”?

Specifically, what is the escape sequence to use?

BTW, I am a Programmer by trade, but new to EasyMorph.

I generated a 1-cell table using “Sequence” transformation (toolbar Main -> Insert table -> category Generate). Then used an expression to generate char(20) as in the screenshot below, then simply copy-pasted it into “Search for” in “Table-wide replace” (right-click the cell, then “Copy”).

Alternatively, you can use Notepad++ to create ASCII symbols (menu Edit->Character panel).

So here is what I have, a CSV with a LF embedded in a comments header (From Notepad++):

In Excel it produces (if read in):

But in EasyMorph I get this:

I could write a python script to clean it, and then send it to easyMorph. But it would be better if easymorph could just read it in.

Red

At this point EasyMorph can’t read text files with LF, CR or CRLF inside strings. Any of these symbols breaks the line. It’s a known issue.

As a workaround, you can load this CSV in Excel, then save it as a workbook (.xlsx). Then read the .xslx in EasyMorph using “Import from Excel” transformation. In this case LFs will be preserved.

Later you can remove LFs from this column using removetext() or removechars() function.

Thanks. That will work.

Red

You’re welcome, Red.

Any update on a fix for this bug? Unfortunately, the save as XLSX workaround introduces a whole host of other issues with dates/times, etc…

Thanks

@tech4him

Support for quoted line breaks in CSV files is planned for version 3.8 pending for release in September. Until then, the workaround with XLSX is recommended. Sorry for the inconvenience.

1 Like

A colleague has just pointed out that you can use the Sanitize() function to remove line breaks. Hope this helps someone!

1 Like