Preview: High-level workflow diagrams in EasyMorph - what do you think?

dgudkov · August 19, 2020, 10:36pm

We’re working on a new feature in v4.5 which is high-level workflow diagrams. Such a diagram would show a high-level overview of the workflow of a module. The diagram shows tables, groups, and dependencies between them.

Note that what you know as tabs will be now called groups (initially we wanted to call them blocks but that term would be easy to confuse with modules). In a regular tabbed view that is currently available the groups (think of it as “logical groups of tables and charts”) are shown in tabs - one group per tab. A diagram shows groups as boxes with tables. So basically, one box in a diagram is equivalent to one tab. For instance, in the screenshot below, two groups (tabs) “Transform” and “Report” are shown. If you switch to the regular tabbed view, that would be two tabs with same names - “Transform” and “Report”.

Actions are not shown in diagrams, only tables. Adding/editing actions will remain in the regular (tabbed) view mode and in the maximized table view.

When a table is selected a list of actions is shown in the sidebar. Clicking an action will switch to the tabbed view mode with the action properties open. Also it’s possible to jump to the maximized table view and back right from the diagram.

Click to zoom:

What we learned while working on high-level workflow diagrams:

When tables have descriptive names, such a diagram gives a good high-level insight into workflow logic and dependencies. The diagram forces naming tables semantically, instead of generic “Table 1” or “Table 2”.
Showing actions in a diagram is a bad idea so we ditched it. It’s more or less OK on small projects, but with 300-400 actions readability becomes extremely poor. Instead, focusing on the tables and their dependencies provides a nice, high-level, semantically meaningful picture of workflow.
Groups must not have cyclical dependencies. This will also enable new, very efficient error handling similar to the try/catch pattern in mainstream programming languages. The new error handling will become available later in 2021.
The diagrams are good for exploring dependencies, especially when external dependencies will be added (more on that below).

The initial version of diagrams released in v4.5 will be rather basic. In the future releases we will add a few more features to it:

Highlighting for the tables that the selected table depends upon, and the tables that depend upon the selected ones (i.e. highlighting table dependencies)
A rich, on-hover popup with additional table metadata - annotations, input/output dimensions.
External dependencies - files, connectors, called modules, external applications, etc. All dependencies will be clickable for easy navigation/exploration.

I’m posting this while this feature is still in the works because we would like to hear your opinion on it before it’s finalized and released. How does it look to you? Any feedback / questions / suggestions / critics?

lcaroli · August 20, 2020, 10:04am

I suggest to create a printing function for the workflow .
When I create a project documentation usually I need an overview image of the project.

reynsnivea · August 27, 2020, 1:23pm

Hi Dmitry,

I find this an import feature. Having an overview of a data flow is very important.
I understand the argument to only focus on tables and not actions so that the schema is not cluttered.

Some questions:

Will the schema include all tables of all modules in a morph-file ?
How will the schema look like if we have a master dataflow that triggers external morph-files. Will the schema if generated in the master data flow also show all tables from the subprojects ?
Will the tables be clickable so that we can go to a specific table in the data flow ?
Any plans to also add an auto alignment option in EasyMorph to nicely organize the the tables (in all tabs)?

Kind regards

dgudkov · August 27, 2020, 1:48pm

Hi Nikolaas,

A diagram shows the workflow of 1 module at a time. The module sidebar (on the right) will remain accessible, so it will be possible to switch between modules while remaining in the high-level overview mode.

Not in the first release, but later (maybe in v4.6) we will make the diagram show external dependencies. These dependencies include imported and exported files, database and other connectors, and also called modules/projects. While the master dataflow won't show tables in the subprojects, the links to subprojects in the diagram will be interactive so it will be possible to open them in one click.

Yes, absolutely. Double-clicking a table will immediately open it in the regular tabbed view. It will also be possible to open a table in the maximized (i.e. full-window) view.

No, no plans for that. We tried to approach this problem a few times but couldn't come up with a sufficiently "smart" algorithm for automatic layout.

raskarov · August 27, 2020, 5:11pm

This looks like a useful feature to have. I’ve only started using tabs recently and they are really helpful. No comments on the diagrams. I really like what you have on the screen. This might or might not be related to diagramming but I often struggle with two things:

No zoom-in / zoom-out feature. In most apps, I can use CTRL + mouse wheel to quickly navigate through the canvas. For large projects, it might be useful to have. One example that comes to mind is sqldbm.com or moqups.com that have this feature
For more complex projects, I want to be able to control the arrows and connectors. Very low priority but I’d rather be able to pick arrows. Otherwise, it starts to look like this:

But the diagrams look great. Excited to see them coming!

reynsnivea · September 1, 2020, 6:41am

I fully agree ! I have a lot of rather complex EasyMorph flows that look similar as the one depicted in your screenshot. I try to split out into modules and then into tabs but often it stays difficult to find your way around in a flow when you reopen it after several weeks or months.

I think one solution is the auto alignment but Dmitry points out that this is not as easy as it looks. Second, bundling even more functionality in 1 action could help. For example, we often need to scan a bunch of files, get the timestamp from the filename in order to deduce the most recent file and then import it from sharepoint, S3, ... If all that logic could be packaged into 1 action, this would simplify a flow. It's not the best example, but I guess there are more regular transformations that could be packaged into 1 action in order to simply a flow.

dgudkov · September 1, 2020, 12:37pm

@reynsnivea, @raskarov check out this new tutorial article (added couple days ago): Tutorial: table groups.

Does it make sense?

reynsnivea · September 2, 2020, 9:55am

Hi Dmitry,

Yes this makes sense. I follow a similar pattern today in my morph-files.
For example. Download_source_data is a tab, ETL is a tab, Clean data folders is a tab, …

Thanks for sharing this tutorial.

Kind regards
Nikolaas

rebmanna · September 3, 2020, 6:53am

Hi,
looks like a good way to separate dataflow from Logical flow and to get more structure in projects .

Do I understand correctly that you can use it to implement branches / jumps in the data flow, return to the logical flow, delimitation of formatting, data enrichment … similar MS SSIS?

Kind regards,

Adrian

Steve_Nahrup · September 18, 2020, 3:18pm

I had to stop messing with this because I realized I was spending wayyyy too much time on this, BUT…maybe arrows should be removed by default and turned on if the user wants to actually view the arrows. By default, all actions that relate to other tables could display like the below compared to the way you organized it.

The table colors was me just throwing it out there as a potential way for users to manually assign each table a color or tables that are related the same color.

Also, I thought if numerous tables were being derived from one table, it could be highlighted (in this case it’s orange) and also when clicked it could expand and reveal which tables are using it. I apologize for the shotty creative work, but that’s not my area of expertise. Functionalities and features are more my thing.

I think there’s an awful lot of potential and exciting things to come and can’t wait to see them!

adambeltz · September 19, 2020, 3:22pm

I’m coming back to this after awhile. It’s interesting to see how everyone uses EM and how that translates to workflows and diagrams.

As an example I rarely use tabs because I like to see everything together. But I use modules because I do a lot of iterations and I almost always have a “controller” module in which I annotate everything.

I think once released it will be interesting to see how it fits into each person’s workflow.

dgudkov · September 19, 2020, 5:42pm

Interesting. I see that many people don’t use tabs or use them rarely. When I designed EasyMorph, the concept of tabs appeared to me as something that goes without saying, because they naturally define stages of a workflow. Another benefit - temporary tables created for intermediate calculations remain local to the tab and have no references from tables on other tabs. So a tab is a stage (phase) which has the purpose of producing a dataset (datasets) that can be used in later stages.

When I see a workflow in more traditional ETL tools like Alteryx or Pentaho, I always think that it’s a poorly structured mess, because it has no clear high-level structure or stages. Where does data cleansing start and where does it end? Where is data quality validation?

When a workflow has clearly separated stages I, for instance, always know that when an action references a table on tab “Verify data quality” or a later stage I can be sure that the table has no data quality issues.

So on one hand, having clearly defined stages (or phases) of a workflow counts as a best practice. On the other hand, I see that many users struggle to use tabs and some find them too restricting. Maybe EasyMorph enforces the best practice on users a bit too hard. I’m not sure what to think of it.

I wonder if the high-level diagram described in this topic will encourage EasyMorph users to give their workflows a better defined structure in a less enforcing, more natural way.

reynsnivea · September 20, 2020, 7:31pm

Hi,

I use tabs in a lot of my projects in combination with modules so I find tabs very useful to structure the project.

As I have mentioned before, what could help for me personally is the auto allignment of tables (but that was not straight forward to implement apparently) + the overview schema. Also it would be very good if we could search through our tables to navigate from on table to another.

Kind regards
Nikolaas