Concatenate error: Array dimensions exceeding supported range

Rick_Scanlon · April 17, 2020, 3:51am

Hi. I have the latest version of EM (4.3.1.0) I have a table of unique customers (~ 2mn) and a table of unique products (~ 50). I created a table of 2 rows with all possible combinations (=> 2 columns with ~ 100mn rows). I did it with cross merge and it worked well.

Then I tried to add a primary key for each combination with a simple new column = [column 1] & [column 2]. That’s when I get this error. I can calculate pretty much any new column other simply than concatenating [column 1] and [column 2]. Both columns are text type.

I tried to export the un-concatenated file and bring it back in to break the chain, but still unable to concatenate. I tried through a rule. No luck. Am I hitting a limit?

Denys_Isaev · April 17, 2020, 11:52am

Hello Rick, can you please send us diagnostic inforamtion?

Rick_Scanlon · April 17, 2020, 1:42pm

Yes will do today. Thanks!

dgudkov · April 17, 2020, 4:50pm

Hi Rick,

it looks like you’re running into the limit of approximately 48 million unique values allowed per EasyMorph column.

Typically, in order to deal with large datasets we recommend using iterations, i.e. splitting the source data into chunks of data and processing it chunk by chunk in a loop, using iterations. In this way it’s possible to process billions of rows without loading all of them in memory at once. Iterations (loops) in EasyMorph is a very powerful technique described in our tutorial: https://easymorph.com/learn/iterations.html

The description of task you provided above doesn’t specify how you want to proceed the data further, after you created a synthetic primary key. As I see it, you can arrange in EasyMorph an iteration across the list of products, and calculate a synthetic primary key for 1 product x 2mn customers 50 times, one time per each product. So your dataset would never exceed 2mln rows at once at any point.

If, for instance, you would like to upload the result into a database table, then instead of calculating a primary key at once for entire dataset and then uploading the entire dataset into a table, you can do the same but in chunks of 2 million rows. Performance-wise it would be pretty much the same as it’s the same amount of work just sliced differently.

Rick_Scanlon · April 17, 2020, 8:45pm

Thanks Dmitry! I actually had a workaround for the synthetic primary key. I was just wondering if I hit a weird concatenate limit or bug because I was able to do other calculated columns after the cross merge, just not any with concat functions. Anyway, your answer makes perfect sense. Should have resorted to iterations rather than the workaround. Thanks again!