Split (very) large text data

Hello everyone,

I’ve got a text column where I can have values of length of 3M+ character.

I need to split that column in 100 fields of 65.534 character length (because the target database have this weird structure).

At the moment I used the “Split fixed width text” action, that have a 32.767 character limit per field so I splitted the original value in 200 other columns (C1_a, C1_b, C2_a, C2_b etc.)

After that I create a new column with the concateneted pairs (C1_a + C1_b = C1)

Question: is there a better way to manage this?

I’m asking that because the original 3M+ length column is in expansion and today I can manage it with 100 fields, but tomorrow I’ll have to expand it in 150 fields, 200 fields … and for every final field I need to add 2 rows in the “Split action” and 1 new row in the “Calculate new column” action.

Last time I expanded from 50 to 100 fields, it took me a very long time … maybe some of you can suggest a smarter way to manage this?

This looks like a good candidate for using iterations. In each iteration:

  • Extract 65.534 characters between positions (N-1) * 65.534 + 1 N * 65.534, e.g. using the mid() function.
  • Append a column to the dataset from the previous iteration (e.g. using an external .dset file)

Hello Dmitry,

this was my 2nd option, the 3rd one is to use a script (c# or python)

Did your 2nd option work?

Hello Dmitry,

it’ll work, at the moment I cannot modify that part, but I’m sure that an iterated module will work perfectly!