Excel: Delay / limit updating of live preview?

We have a huge 250MB XLS file with about 100.000 lines, which I need to transform. EM is mostly running in the updating step to create the table preview.

Is there a way I can stop EM to do live-updates until I setup up my transformation pipeline to speed things up? Or that I can limit the live-update & preview processing to the first 200 lines or so?

I don’t need a live-update & preview for the whole file.

Ok, I found the “Autorun OFF” option.

However, EM now consumes 32GB of RAM when using a 250MB XLS… that doesn’t looks like a very efficient way to handle the data. And way to reduce the memory footprint?

Hi Robert and welcome to the Community!

Yes, it is recommended to switch off Auto-run for heavy spreadsheets.

Make sure that the super-compression is turned on. You can find it in the About menu:
image

Oh… didn’t expected any functions in the about menu :wink:

Going to test it with my big XLS file :slight_smile:

Hi Robert,

Let me clarify, are you importing an XLS or XLSX file?

How many rows and columns does the file have?
And what kind of data it contains?

I just tested import with a 110Mb XLSX file and it consumed 6Gb of memory.

That looks reasonable to me because of the several factors:

  • XLSX file is just a zip archive that contains several XML files. An unzipped version of the 110Mb XLSX file occupies ~700Mb on disk
  • When an XML file with worksheet data is loaded into memory, an object model of that XML is created. And that object model consumes a lot of memory. Up to 8-9 times greater compared to the size, occupied on the disk by the same XML file
  • After the import is completed, the object model of the XML file is discarded, but the process may still hold onto the allocated memory. It will release allocated memory gradually when some other application
    will start to consume memory (and the amount of free memory is less than 2-4%). This behavior is due to the way memory allocation (garbage collection) works in .Net Framework-based applications in Windows.

It’s an XLSX file.

Thanks for the insight about how XML structures are handled.

I thought, that EM stream-loads the XML into an internal data-structure, hence doesn’t need to load the whole XML DOM, which of course is very chatty.

IMO handling of huge data-sets with a tool like EM should use a stream-reading approach. So, only a fraction of data is in memory. And, I see that EM seems to use SQLite in the backend. So, SQLite should help a lot WRT memory consumption.

We have plans to switch import from XLSX to a stream-reading approach.

As for the SQLite - for now it’s only used to store shared connectors repository. Datasets are stored in-memory in a format, similar to our DSET files.

Great! Streaming will really be the way to deal with big files.

And, great, that you publish your internal format. Very good!

BTW: I like the responsivity of the community too.

Good point! It's a temporary setting. The super-compression is experimental and will become permanent in the next version. Then we will remove the switch altogether.