Tree edge list expansion

As part of several projects, I need to expand a tree structure, e.g.:

  • node 0
    • node 1
    • node 2
      • node 4
      • node 5
    • node 3
      • node 6
        • node 7

I use the following terminology:

  • root node: node without parent (0)
  • internal node: node with parent and child(ren) (2, 3, 6)
  • leaf node: node without a child (1, 4, 5, 7)

The tree structure is a deliberate and mostly accurate simplification: in some cases, the structure is actually a directed acyclic graph (DAG), and in other cases, there's not one single tree but a forest.

input

EDITED based on tesing using the Hierarchy path action:

  • added root nodes as child with empty parent
  • added a test case for a singleton tree
  • added a test case for multiple trees (a forest)
  • added a test case for directed acyclic graphs

The tree structure is represented by an edge list, e.g.:

parent child
0
0 1
0 2
0 3
2 4
2 5
3 6
6 7

A single, parentless, childless node is added to the test data to test how the flow deals with multiple roots (a forest instead of a single tree), and to test how the flow deals with singleton trees.

parent child
8

A simple "diamond" structure is added to the test data to test how the flow deals with directed acyclic graphs (trees in which nodes may have more than one parent).

parent child
9
9 10
9 11
10 12
11 12

output

EDITED to reflect added test cases (see input above), and slightly changed the desired output to more closely align with that of the Hierarchy path action.

The output may come in two shapes:

  • paths
  • reachability pairs

and which rows get included in the output is determined by the use case. Options:

  1. only paths starting at a root node?
  2. only paths ending at a leaf node?
  3. should the path from a node to itself be included?
1 2 3 4 ancestor descendant 1 2 3
0 0 0 x x
0 1 0 1 x x
0 2 0 2 x
0 2 4 0 4 x x
0 2 5 0 5 x x
0 3 0 3 x
0 3 6 0 6 x
0 3 6 7 0 7 x x
1 1 1 x x
2 2 2 x
2 4 2 4 x
2 5 2 5 x
3 3 3 x
3 6 3 6
3 6 7 3 7 x
4 4 4 x x
5 5 5 x x
6 6 6 x
6 7 6 7 x
7 7 7 x x
8 8 8 x x x
9 9 9 x x
9 10 9 10 x
9 10 12 9* 12* x x
9 11 12 9* 12* x x
10 10 10 x
10 12 10 12 x
11 11 11 x
11 12 11 12 x
12 12 12 x x

* reachability pairs should not be duplicated, even if multiple paths exist

Hi Laurens,

Did you check the "Hierarchy path" action? It seems like it would be trivial to do what you need with the help of this action, as long as the graph is acyclic.

1 Like

Hi @dgudkov,
Definitely a great action to work with! It needs some extra work to accommodate for self paths, and to get to the desired output, but that should be no problem.

EDITED to reflect added test cases (see input in the original post) and changed desired output (see output in the original post).

Here's the test set I use to verify it all works.

dummy.morph (4.0 KB)
test data - input.csv (88 Bytes)
test data - output.csv (891 Bytes)
testbed.morph (19.5 KB)

I tried implementing this with the Hierarchy path action suggested by @dgudkov.

Hierarchy path.morph (14.4 KB)

findings

  • minor changes to the data required, mainly adding a root node with empty parent.
    This also pointed out a flaw in the original input specification: single nodes without parent or child(ren) could not be represented, but are present in the use cases (see below). I've adjusted the input specification and test data.
  • paths are extracted very easily using the Split delimited text into columns action
  • reachability pairs are extracted fairly easily using a combination of keepbefore() and mirror()
  • the action only generates all partial paths from a root, e.g. 0, 0|3, 0|3|6 and 0|3|6|7
    the action does not generate all partial paths to a leaf, e.g. 3, 3|6, 3|6|7, 6, 6|7 and 7 are missing
    the missing partial paths can be derived fairly easily
  • filters can be applied to specify the options
  • directed acyclic graph (DAG) don't work because the child is a primary key, so duplicates (required for specifying multiple parents) are not allowed
    I have not found a workaround for this and welcome any suggestions

So most of it works well. How important is the directed acyclic graph? Unfortunately, one of the primary use cases can't do without it.

use case

Users are assigned roles:

user role
john 1
jane 2
minnie 3

Roles can be (and often are) nested, e.g.:

  • role 1
    • role 2
      • role 3

Actual permissions can be (and are) assigned to any role (not only the leaf), e.g.:

role permission
1 read invoices
1 create invoices
2 modify invoices
3 approve invoices

In this specific use case inheritance goes up the tree, i.e. role 2 has all permissions of roles 2 and 3, and role 1 has all permissions of roles 1, 2 and 3. A junior would get role 3 (fewest permissions), a senior role 2 (more permissions) and a manager role 1 (most permissions).

This example is still simple and (crucially) a strict tree, not a directed acyclic graph. In reality, the lowest level roles contain permissions for individual tasks, like "view sales orders" or "modify AP master data". These tasks are included in the (job) roles of several employees, and therefore have multiple parent roles.