About EasyMorph Tutorials & Examples Web-help

Excel Import missing AlphaNumeric/Text Data columns


#1

Hello,

Let me preface the query with the fact that I am unable to send you the original source file as it contains sensitive and confidential data from a trading partner. Plus, if I save the file and sanitise it the issue disappears which leads me to believe it is a data formatting workbook formatting issue in the source file.

The scenario is
- receive an XLS file from a trading partner via email containing two sheets. The first is just a header/summary sheet, the second sheet contains rows and columns of data with a mixture of text, dates and numerics

  • If we drag and drop the message to a folder and then drag and drop the file to EM as a new Import Table only the numeric values and the occasional text value are brought across. There columns containing text are shown as place holders but they appear 'visually empty'. EM also is not able to identify the first row as containing headers. It is like all Text elements are ignored.

  • If we Open the source file in excel and save it in the same format (.XLS) and make no changes whatsoever, the file increases in size from 17K to 27K and will successfully import into EM.

I am unable to simulate the issue as creating and saving any excel file seems to always work. My guess is that this XLS file is generated at the Partner end by some type of report generation tool that is possibly pushing some type of formatting codes or even perhaps encoding the data some how that prevents EM from interpreting it correctly and that when we save the file from within Excel this is corrected and the file can then be interpreted.


#2

Hello Bruce,

it is near to impossible to understand what's broken in that file without seeing it. The XLS format is rather complex and can be broken in a lot of ways.

Can the partner prepare similar file but with data sanitized on their end? If they can, and such file demonstrates the same behavior -- please send it to our support email.

Can the partner generate .XLSX (not .XLS)? if yes, does EM still fail to import it? Can they generate CSV?

If they can't prepare a sanitized file -- can they at least tell what report generation tool / library they use?

Thanks

PS. Looping @andrew.rybka


#3

Thanks Dmitry

All valid questions and I agree - near impossible. But was taking a stab in the dark. I am asking similar questions of the partner and we will see where it leads. Appreciate your feedback - just thought you may have come across something when you were doing your own translations.

I have already found one difference in that Excel thinks that the file that fails has a "blank" footer on each sheet rather than (none). When you save the file it magically loses the footer and sets it back to "(none)" as well as adding 20K to the overall size of the file.

I will find out what is generating it.

cheers


#4

Had no luck with the Partner. The file is generated from a in-house customised platform and I doubt our contact would be in a position to dig any deeper. They cannot generate a sample sanitised file and we can't because as soon as I open and save the file the issue us corrected.

I suspect it is an character set encoding issue. To your knowledge is there anyway I could dump internal details out and compare it to a file that does work?

Are you doing anything within EM that may be reliant on character set encoding?

cheers


#5

Hello Bruce,

Can you please ask the Partner if it's possible to generate XLS file with only headers or with one or two rows of data which can be shared with us?

Thanks


#6

Bruce,

Character set encoding can be the source of the problem. But if this is the case, we can't do anything about it without knowing some internal details of this file.

I searched for a tool which can dump internal details of XLS file in a useful format but I wasn't been able to find a good one.

Some tools can dump binary representation of different workbook parts but I don't think that comparing binary data will be helpful because of the difference in the size of original and resaved files.

There is one tool which shows internal structure of a workbook in a human-readable way - POIFS Browser.

But it doesn't seems to work quite well - some elements are displayed in the list but you can't view their content. And there is no way to dump elements list.

Another problem is that it's hard to describe which data to look for (since XLS file is created with third-party software, it can have a slightly different structure and format). But if you would like we can look at this file together through Skype or WebEx.