Bulk insert options

Hello, is it possible to limit the bulk insert operation to a maximum number of rows at a time?

Use the normal Export to database.

image

1 Like

Thanks a lot!!!
I didn't realize that the normal export mode could export in batches.
However, it would be nice to be able to manage retries in case of failure.

1 Like

If you schedule the job, there is a retry on failure.

At least on the Server, I don't know about the launcher.

I know about this option, but we would need a way to upload a large amount of data in a short time without saturating the resources on the Azure cloud and at the same time guarantee its insertion. I cannot create a job to schedule for each insert action.

"Bulk export" in EasyMorph doesn't use SQL INSERT. It uses the special proprietary APIs provided by the target database.

Typically, databases treat bulk inserts as atomic operations. It means if a row fails during bulk insert, then the whole operation is rolled back. However, some DBs don't - check your DB docs on that.

You can do batch loading even with "Bulk export" already with the help of the "Iterate table" action and loops (iterations). Partition your dataset and do "Bulk export" one partition at a time. See example below:

bulk-export-by-partition.morph (5.5 KB)

Often in these cases we have to delete a large set of data from the table to reload them updated, the best thing would be to understand which ones need to be updated, deleted and inserted but this is a considerable effort considering that we have to keep data sources updated unilaterally therefore we proceed to delete a data set based on certain dates and then reload them. However, if this operation fails we end up with incomplete data sets and this is a big problem.

I also thought about creating a module and executing the bulk insert internally, but some problems remain:

  1. I would have to create a module for each bulk export action and it is not at all comfortable and convenient when you are performing numerous exports
  2. how do you recommend managing retries?
    Since a retry can be set in the task scheduling, I thought I had to create specific tasks (with considerable effort) and then use the "EasyMorph Server Command" action using "Run task" but the problem is that the action doesen't have the "retry" option
  3. How can I be sure that the command was successful and possibly cancel both the delete and the insert operations?
    If I use the "Database command" action by creating a transaction before executing the delete and one after the "bulk insert" operation to commit the transaction, it could be a solution?

Bulk insert is not intended for atomic updates.

Why can't you just update them in place, without deleting? Produce a list of primary keys that need to be updated and then for those records use, for instance, the "Update database table" action. The remaining records insert using "Bulk export" or regular "Export to database".

You seem to have a complex logic that you (understandably) want to be atomic but the database doesn't provide atomicity for such complex logic out of the box. EasyMorph can't provide more atomicity in database operations than the database itself, so there's no point in expecting that. In such a case, it might be a good idea to use a temporary table in the target database to construct the necessary records (updated + inserted) without modifying the main table. In this case, even if an operation fails, it won't affect the main table.

Then use one operation, e.g. SQL UPDATE FROM Table to update records in the main table from the temporary table. This one operation is atomic, and if it fails, the database will roll it back automatically.

It depends on what you mean by retry and what type of failure you expect. If a retry is simply repeating the workflow verbatim from the beginning then the "Retry" option in the Server task settings does exactly that. It may help if your database connection is finicky. However, if the task fails because some particular value is rejected by the database, just repeating it again won't help.

It doesn't have to. The task itself has a retry option.

As far as I remember, transactions don't survive closing the connection, and connections get closed between EasyMorph actions, so that won't work (@andrew.rybka correct me, if I'm wrong). See my suggestion above about using temporary tables.