Easymorph needs a lot of memory to load : correct?

RJO · December 7, 2017, 4:12pm

Hi,

I tried to load a table with 140 columns and about 800 000 records. I could not do this with easymorph because my computer had 4Go RAM left and it did not seem enough. Strange isn’t it ?

I tried the same with Excel 2016 (PowerPivot) and had no problem.

Is it surprising ?

dgudkov · December 7, 2017, 4:45pm

For 140 columns and 800K rows you would need to have at least 8GB of RAM to load.

It’s not strange or surprising. We just don’t have such R&D budget as Microsoft do Version 4.0 of EasyMorph will have improved compression that will make it possible to load 2-3 times more data, or use 2-3 times less RAM for the same data volume. This will be on par with in-memory engines such as PowerPivot or Qlik.

Meanwhile, if you need to work with large or wide tables – install more RAM. We recommend having at least 8-16GB of RAM. On the positive side, RAM is cheap nowadays.

sidhesh_mangle · February 25, 2018, 6:40pm

I got excel file of 80 MB, it has ~ 18 columns, i am trying to aggregate it at 4 columns. Morph processing shows utilizing 100% RAM, later it gets hanged.
My machine has 8 GB ram, i 7 CPU… Will upgrading it to 16 GB RAM solve it?

dgudkov · February 25, 2018, 7:03pm

It's hard to say. A 80MB .xlsx contains 1-2GB of uncompressed XML data, which needs to be parsed, loaded and compressed which also requires memory. Chances are that upgrading to 16GB would solve it, but I can't guarantee it.

I would suggest renting for a couple hours a virtual machine with 32GB on Amazon or Azure. 32GB should definitely be enough. You can use the VM to measure exactly how much RAM needed to load such huge Excel file.

reynsnivea · May 20, 2019, 12:48pm

Hi Dmitry,

Will memory usage be optimized in version 4.0 so that we can load CSV-files faster and so that they will take up less RAM ? We consider transforming each large dataset to DSET or other format not really practical because it is an extra step.

We have the problem for some time that the laptop of our users only have 8 GB of RAM. So for datasets that go beyond 1 mln rows and let’s say more than 75 columns, performance is really affected. RAM cannot be extended on those machines and using workspaces on Amazon is not really encouraged by our IT-department (necessary network ports are closed for users to access workspaces and the browser app does not work realy well). So we will have to choose about installing EasyMorph on the laptop of the users instead of a cloud workspace.

One problem is that for large datasets, one wants to view the whole dataset at once beause working on a sample could hide data quality issues. Also, sampling at this point is not implemented in EasyMorph so we cannot load a random sample of records of a CSV for instance.

Any ideas about how to proceed and if performance will be improved in 4.0 ?

Thanks !

RJO · May 20, 2019, 3:39pm

I know easymorph support does not recommend installation on servers but if you have citrix technology, you might be interested by a deployment on a farm of full RAM citrix servers for your users ?

dgudkov · May 20, 2019, 4:12pm

Couple suggestions:

You can use the Server (which presumably has more RAM than users' desktops) to convert large CSVs into .dset files. This can be done as follows:

create a Server task that converts a CSV file (specified by a parameter) into a .dset file.
create and give every user a project that does the following:

Uploads specified (by parameter) CSV file from user's desktop to Server using the "EasyMorph Server Command" action
Triggers the task created as per 1) above using the "EasyMorph Server Command" action
Downloads the resulting .dset file to the user's desktop using the "EasyMorph Server Command" action

Now your users can convert big CSV files into compact .dset files without using any RAM on their computers just by running a small project. Hint: make it a Launcher task.

Alernatively, use the "Split delimited files" action to split one large CSV file into many smaller CSV files. Then you can use a subset of the smaller CSV files to work with.

No changes in performance of CSV files loading is expected in version 4.0.