Partitioning large DSET efficiently

I will often pull 52 weeks of data at once (from a DBMS) and then write that to a DSET file.
The file can be around 50 M rows - so it's quite large (2-3 GB).

I will then partition that 52 week file into 52 weekly DSET files.

But what I have found is that while EasyMorph does a good job of processing multiple partitions using Modules, I am struggling to iterate off of one big file to generate little files.

Basically I need to call the same Module 52 times which involves re-opening the same massive DSET to write out a single weekly partition.
I don't see a simple way to iterate within a Module and writing to multiple files (other than manually configuring 52 derived tables).

Currently the process takes more than 20 minutes - I'd like to get it down to less than 5 minutes if possible.
Any suggestions would be much appreciated.

Hi Neil,

Instead of the regular "Iterate" action, you can use the "Iterate table" action. The latter can pass the big table entirely in memory, without needing to re-load it from a .dset file in every iteration. Instead, the big table is received in the iterated module via the "Input" action.

Also, for better performance, use the "Keep/remove matching" action instead of "Filter by condition" to filter the big dataset down to a single week in the iterated module.

Thanks Dmitry.

I did look into that.
The Action I think you're referring to is "Call with another table" (as opposed to "Iterate table").
The "Call with another table" Action does work as you described, however it can only process a single iteration.
It is possible to repeat this action multiple times manually, but would prefer not to do that.
The help page also mentions how the "Verify data in another table" Action can be used to 'chain' these "Call with another table" Actions together.
Although how to do this is not clear to me.

For now this is not a big problem for me (it's just something I want to optimize) so it's not on my critical path.

A simple and intuitive solution would be if the "Call with another table" had the ability to iterate, or the "Iterate" Action had the ability to call with another table.

That’s exactly how the “Iterate table" works (not to be confused with Iterate).

1 Like

My apologies Jochen and Dmitry.
I stand corrected.
When I first attempted to use the "Iterate table" Action I mistakenly thought it was like Iterate but using a different table to iterate on (as opposed to iterating on the same table and passing in another table).
I've since modified my project to use "Iterate table" and the run time has gone from > 20 minutes to < 2 minutes (10x improvement).
So problem solved - appreciate the clarification. :slightly_smiling_face:

2 Likes