Hello EM community,
I am hoping to get some pointers to in dealing with a scenario I have using the iterate web request action.
I have a dataset that has poor, if any, change tracking in the tables. I am pulling the records and performing transformations on them and then writing those to a project management software called Monday.com via a graphql api.
The size of the dataset is about 5000-6000 records. I am finding that on average it takes approximately 7 seconds for the API to process each request. If I let the process iterate through sequentially I am looking at essentially an entire business day to write these to the destination.
I have tried splitting out the table into smaller tables to attempt to get them to write in parallel. I would really love to get this down to about 2 hours altogether. However, what I am noticing is that if I breakout the table into smaller tables like this, Easymorph will not necessarily execute them in parallel in the manner I would think it would. I am attaching a screenshot below of what I see while it is processing.
Can anyone think of any ideas on how I might be able to make this more performant?
Independent tables derived from the same source tables are executed in parallel (see below):
In addition, you can split your dataset into two or three partitions, and then for each partition call a module pass a partition to that module using the “Input” action, and then inside the module divide it again into two or three derived tables with an “Iterate web request” in each. Using this technique you can achieve degrees of parallelism 4, 9, 16, and more.
Remember, however, that the number of actions executed in parallel depends on the number of CPU cores on your machine that runs EasyMorph. EasyMorph never spans more computational threads than CPU cores on the machine that runs the workflow.
Therefore for higher parallelism, run workflows on a machine with more CPU cores. See also this reply: Question: What controls the limits of how many actions can be executed simultaneously? - #4 by Denys_Isaev
Thank you Dmitry. This is a very valuable response. My CPU only has 4 cores, so that appears to be the blocker for breaking out more parallel processes. Time to upgrade.
Starting from version 5.4.1, the “Iterate Web request” action can send multiple requests in parallel with the specified degree of parallelism.
The action also allows keeping the input dataset.