Does EM connect and disconnect between each SQL call or does it create the connection when the project is running and then disconnect when finished? I have an extremely long running job that is failing because the connection keeps getting killed. I am wondering if you are establishing the connection once and then it gets to the connection timeout limit and disconnects automatically. I am also confused about the performance when “chunking” a large set of data from a database. I can unload 50 tables totaling 2.1GB of data in 9 minutes while 7 tables totaling 17.3GB of data is taking over 3 hours (these 7 tables are chunked out to 115 files). Any thoughts?
A new connection is established for each SQL call. Database connectors typically have timeout settings. You can increase timeouts in connector settings.
It’s hard to say, there are several moving parts here. Database, network, driver, EasyMorph - anything can be a bottleneck under certain conditions.
Does your source database have an SQL command to dump a table into a CSV file? It might be better to do the export on the database server, and then only transfer the output files to your local machine over SFTP or in a similar way.
Also, when the source is slow it may help to parallel execution. Instead of having one iteration, split batches into two derived tables (odd in one, even in the other) and run two iterations in parallel. Independent derived tables are executed in parallel. See here: Derived tables synchronisation - #2 by dgudkov