Parent-child structures are quite tricky in EM, but you can get very powerful results.
You can even simulate a reccursive outer-join.
If you use the Repeat transformation and make a self join in the loop, you'll get a relative path because you make a self join on the result of the previous iteration.
If you want a "classical" structure, you need to join the result of the repeat iteration with the source table
This is explained in the project shared on this link
it uses the accounts tree from adventure works DWH
You can customized with filters, tests or calculation within the loops. This is where the power of the Repeat transformation appears. You can customize a DFS with selecting the path that gives flatter structure or make a bottom-up then top down exploration:
General ledger, bill of material, organisation charts, most of the ERP, banking software are based on parent-child structures. Very few data transformation software can handle them. I really wonder why ?
Sometimes you need to have all the elements that have a specific upper element, sometimes you want to have the reachability (all the elements that you can reach up or down).
Maybe this will help.
I have a parent-child table that I’d like to flatten. It consists of an ID column and a second column that represents the child elements, separated by a semicolon. Sometimes, this column also includes an operator (+ or -) or an operation indicator (Reset or Add Difference).
I tried using the Create Hierarchy action after splitting the second column into rows, but it always raises an error. I also tried the Repeat action, but then I get a message saying there are too many loops.
Could you tell me how to achieve this without having to perform as many outer joins as there are possible values?
The "Hierarchy path" action correctly raises an error, because your data structure is not a parent-child tree, it's a graph.
In a parent-child tree, any child has one and only one parent. However, in your sample data, this is not the case, no matter if we consider ID or Content as children:
The [Content] column can't be children because for the "child" Content=3, there are multiple "parents" ID=1 and ID=5.
The [ID] column can't be children because for the "child" ID=3 there are multiple "parents" Content=7 and Content=8.
Therefore, it's not a one-to-many or many-to-one relationship. It's many-to-many, i.e. graph.
Strictly speaking, a parent-child tree is a particular case of a directed graph in which nodes have a one-to-many relationship.
Indeed it's not a parent-child hierarchy, but how do I then build a table with each ID and associated (sub)content with, I guess, the repeat action ? The goal is to merge a second table where the ID is the link, and contains a list of charts of accounts (eg ID 1 has 2,3,4 as "children" and in the charts of accounts 1 is linked to 205000, 2 to 300100, and so on...).
This graph is not a tree because children in it (e.g., node 3) can have more than one parent.
You can merge to this table another table, or itself to obtain paths in this graph. For instance, if you do an inner merge with itself, you will receive a list of all 2-hop paths in this graph:
7 -> 3 -> 1
7 -> 3 -> 5
8 -> 3 -> 1
8 -> 3 -> 5
You can technically repeat merging the table with itself (using the "Repeat" action) until you receive all paths in the graph. This is probably the closest equivalent of receiving all hierarchy paths in a tree-like graph.
Thanks for your answer. I fully understand the concept and the need to merge the table with itself to get the list of nodes, but I don't understand how to use the 'Repeat' action to achieve this. I tried to follow your example with the path list, but I have to admit that I'm stuck.
Where are you stuck ?
I guess you are trying to make a self-join of the resulting table with itself in a repeat loop
In such a case you'll get an exploration of your graph that will change the"starting point" of your exploration on each step of the loop
One trick can be to export the base table and to make the join of your resulting table with this base table in each loop.
In this case the repeat action will behave like a "with recursive" SQL query.
Maybe have a look at this post
and to the join file "EM repeat with table.zip" in the post
Thanks for your answer. I tried to follow your example but couldn't figure out how to use the repeat action with my dataset. I did eight left joins to the main table and got the expected result. I will try again later, when I have some time, to fully understand how you did this.