Flatten csv into single rows/records

Jason_Carney · July 7, 2021, 3:11pm

Similar to my prior post regarding the flattening of an XML, I now have a use case which involves flattening a csv file into single rows/records.

Using the attached csv input file, you can see that for a given Purchase Order, there are multiple rows. (The provider / source of the file breaks down the records within the purchase order by Header, Other, and Detail as per the "Record Type" attribute). -file removed-

Since we don't care about the record type attribute, we want to flatten the multiple rows into one, effectively making the 'primary key' the PO Number. This way all of the other details from the columns rolls up together.

I've attempted to use the previous project to flatten using ranks, matrix table, deduplicating rows... but I'm only ending up with 1 record whereas the source csv potentially has multiple (multiple PO #'s). See attached. flatten-csv.morph (14.5 KB)

Any other thoughts / suggestions as to how to parse the input csv and output to the desired single record(s) as noted above ?

dgudkov · July 7, 2021, 9:34pm

In this case, all calculations should be done within groups of rows defined by column “PO Number”. Actions used in the project, such as “Enumerate” or “Unpivot”, support data transformation within groups of rows. So I’ve added grouping by PO number.

See the updated project below.
flatten-csv.morph (16.0 KB)

Jason_Carney · July 8, 2021, 3:08pm

As always, appreciate the help! This worked perfectly.

Jason_Carney · July 19, 2021, 6:50pm

@dgudkov - I've come up with a slightly challenging variation to this one. In my original example, the flattening generally worked, as the orders contained only 1 single line (one product) being ordered. In that situation, flattening works fine, as there is only one value to worry about for columns like product, SKU, etc. However, we've realized that there are other cases with mutiple products under the same PO number which is what was being used as the "primary key" to organize things. (See attached example of multiple order lines).
Since we were using the "fill gaps" and grouped by PO number we are only essentially getting 1 product whereas in this example there are 3.
So I'm wondering if there's even a way to flatten, such that in this case we'd have 3 rows in the output. Any of the unique values would be retained, any blanks would get filled, etc. Perhaps somehow keeping the highest value in "PO Line #" to determine how many rows to keep ?
My attached example is just 1 order... as per the earlier message, this could also contain multiple orders (and thus, each order could have multiple lines of products).

dgudkov · July 19, 2021, 9:22pm

The approach is still the same - do the flattening within groups. Just in this case the groups are defined not only by “PO Number” but additionally by a product ID (e.g. SKU). You may need to create a synthetic group ID by concatenating PO Number and SKU.

Jason_Carney · July 20, 2021, 4:06pm

I have tried a few different variations of what was suggested... grouping by PO Number + PO Line # (which is unique, and there is 1 line number per item ordered). Also tried to concatenate the PO Number + Line # to create a new unique identifier (Ex: PO1234567_1, PO1234567_2 etc) and then group by/enumerate using that UID... Doesn't seem to quite do what's needed though, as ultimately I still end up with only 1 record for the PO number, versus an appropriate # of records (rows) based on however many products were part of the order.
My latest input file and project files are here... not sure if there's any other thoughts on how to tackle.
flatten-csv-ordertest.morph (16.1 KB)

dgudkov · July 28, 2021, 9:42pm

Upon a closer look, I see a different picture. You have 3 data entities in one table, all linked to a PO number:

Ordered items (14 columns, multiple lines per PO number)
PO note/comment (1 column, one line per PO number)
PO-level attributes (all other columns, one line per PO number)

Since you need to obtain a denormalized table with all the 3 entities, you just need to separate these 3 entities into 3 derived tables, remove empty rows to produce clean normalized datasets, then merge them back into one table using PO number as the key field.

Here is your example updated. See group “Semantic merge”. Notice the use of the “Left join” mode when merging PO items.

flatten-csv-ordertest.morph (21.6 KB)

Jason_Carney · August 2, 2021, 1:24pm

Once again, thank you for the detailed explanation / teaching along the way… this works great!

dgudkov · August 2, 2021, 10:27pm

You’re welcome, Jason!