Same Excel value interpreted as distinct?

Greetings,
Really enjoying EasyMorph but did find what appears to be a bug when importing from Excel.
Namely, the same value appears to be coded as two separate values.
I have tried the “Clean” step to remove all non-text values.
Must be something to do with Excel because when I manually convert Excel to CSV the issue goes away.
Have attached my project file and the simplified Excel workbook (and a screen cap showing the issue).

easymorph_bug_screencap

DuplicateTextValue.morph (3.0 KB)
debug_dup_merchandiser_v4.xlsx (9.0 KB)

Hi Neil and welcome to the Community!

All the values in the file have a trailing space except for the last one. You can see it in the Cell Profiler that is invoked by right-clicking a cell and selecting “Profiler”.

image

The profiler shows that almost all values have a trailing space:

BTW, you can click and profile different cells while keeping the Profiler dialog open.

Also, if you select the entire column by clicking its header, EasyMorph detects that some cells have trailing spaces. It’s shown as a suggestion (see below).

Finally, you can calculate the length of each value in a separate column using the len() function:

image

Thanks for this solution!
Yes that was indeed the issue. Problem solved. :grinning:

I suppose I got tripped up because other tools (e.g. Qlik, Power BI) will automatically trim those trailing spaces. Something to think about.

But since you have pointed out that “Filter/Profile” feature with the Suggestions, it was very easy for me to correct this and I will keep an eye

Overall though I am finding this tool to have many advantages over Qlik and Power Query (e.g. the “Enumerate Rows” action perfectly solved my problem - doing the same in Qlik and Power Query is a big pain).
Thanks again Dmitry!

You're welcome :slight_smile:

EasyMorph tries to be as predictable and transparent as possible. That's why EasyMorph usually doesn't modify data without explicit instruction from the user.

1 Like

Agree that transparency is generally a good rule to follow.
I have since taken a second look at the “Sanitize” action and can see it’s actually quite simple to sanitize “All Columns” in a single step.
Going ahead I may make this a habit.
Thanks again!