How load a lot of rows faster in a table

isma · July 19, 2017, 7:44pm

Hi Community,
Currently, i use easy morph to load at 2hours a lot of rows in diffirent tables. I load approximately between 1 and 20 million rows. But, i had some problem for loading beacuse it takes a long time (845 min). My question, i wanna know if it’s possible to load faster then that and how we can do it?
Thanks and have a good day!

dgudkov · July 19, 2017, 8:11pm

Hi @isma and welcome to the Community.

Just to make sure I understand your question correctly – are you experiencing performance issues loading TO EasyMorph, or loading FROM EasyMorph to a database?

isma · July 19, 2017, 8:27pm

I Had a performance experienc to load from a database to another database on easymorph. That’s clear

dgudkov · July 19, 2017, 9:17pm

Loading into EasyMorph from a database can’t be faster it’s as fast as it can be. But exporting from EasyMorph can be faster using bulk loading statements. See this tutorial article on bulk loading http://easymorph.com/learn/export-database.html

Bulk loading should be used (if possible) for tables with more than 1 mln rows.

isma · July 20, 2017, 11:50am

OK. I understand to use bulk loading but me i will transfer data from database to another database and i haven’t no idea how to write this bulk into the command database. Have you any idea?

dgudkov · July 20, 2017, 2:29pm

The help article I mentioned above has links to the bulk loading syntax of major databases. Note that if the target database is on another server than EasyMorph, you will need to transfer the CSV dump to that server first.

reynsnivea · May 31, 2020, 2:14pm

Hi Dmitry,

We also face these kinds of problems with bulk loads. We want to load data to an Amzon RDS instance but there the postgresql COPY FROM is not allowed as it requires superuser privileges and as you point it out, the file has to be stored on the same machine as the database instance.
I guess that the CSV to be generated for import also has to have the same column order as the target table in the database.

Are there plans to incorporate bulk loading actions into EasyMorph so that non-technical users can more easily work with larger datasets? For me the this would be of great added value.

Thanks !

andrew.rybka · May 31, 2020, 6:42pm

Hi Nikolaas,

have you tried to use the psql program with the \copy command to bulk load data? It works with local files and documentation states that it doesn’t require superuser privileges.

reynsnivea · June 2, 2020, 11:50am

Hi Andrew,

Yes I have but it’s a bit cumbersome. Postgres always prompts the user for a password so first you need to create a pgpass.conf file so that exports can be automated using the run a program action. This should be done for EM-desktop and server (if there are running on different machines).
Then you need an ugly psql command to do the import.

question: Can we split the ugly long command over multiple command in EasyMorph? Something like:

psql -U -p -h
\copy …

instead of psql -U -p -h -c “\copy …”

Also I believe it does not do auto column mapping + I get problems with free text fields that sometimes contain the delimiter to be used in the CSV. I have tried several delimiters but I always get into troubles.
Sometimes I got errors regarding the encoding used by the cmd prompt versus the postgres database.

So in a nutshell: relatively complicated to just import a CSV into the database.

Any plans to include bulk loading out of the box in EasyMorph. For me it would be a big added value as we deal with larger datasets frequently ?
I suppose that implementing this in EasyMorph or not depends on what’s allowed by the RDBMS ?
I can imagine that other ETL-tools that have bulk loaders just use the COPY FROM under the hood ?

Thanks !

andrew.rybka · June 2, 2020, 2:47pm

Nikolaas, bulk load for PostgreSQL and several other databases is on our long-term roadmap.

reynsnivea · June 3, 2020, 11:53am

Hi Andrew,

Our solution for the moment is as follows:

Export to CSV
Upload the CSV to an S3-bucket
Bulk load the csv from within the bucket to postgres. AWS allows bulk upload of a CSV to an RDS-instance from S3.

I am creating a module in EasyMorph that can be called with some parameters such as the database connector, target location of the CSV locally, target location of the S3 location, target schema and target table.

reynsnivea · June 7, 2020, 12:45pm

Hi Andrew,

Our approach seems to work. I just have a final question. Now that we can bulk load the data, it seems that the process that takes most of the time now is writing the CSV.

Does the writing speed depend on the speed of the disk used or is it EasyMorph that cannot write faster ?

Will there be efforts to improve the write speed in the future ?

When I find some time, I shall share the module I use for bulk upload from S3 to postgres.

Thanks !

andrew.rybka · June 9, 2020, 1:24pm

Hi Nikolaas,

It seems that the disk just has to be fast enough. On my SSD writing to disk takes less than 10% of the action execution time.

You can try to switch the “Wrap in quotes” option to the “Nothing” mode - it’s 2-3 times faster than the “Auto” mode. But you have to be sure that your data don’t contain line breaks, double quotes, and separator characters. The “Everything” mode is also faster than the “Auto” mode. Use it if you can’t use the “Nothing” mode and PostgreSQL will correctly read wrapped numbers.

Also, we are planning to improve the performance of the “Export to delimited text” action in the next release.

reynsnivea · June 11, 2020, 8:35am

Great !