Natural join switch option [DONE]

Hi

In the join transformation, could we have a 'natural mode' switch to make join on all the common fields between the 2 tables.

Regards

Hi

I have to migrate a kimball style datawarehouse to a datalake so I need to generate flat files from star/snowflake schema

i.e.:
sales fact → product → product sub category → product category
sales fact → geography → sales territory
sales fact → promotion
I would like to create a loop that browse up the foreign keys and create a flat file containing the facts and all the dimension attributes
I can’t do it because I need to define case by case the matching columns in the EM Join table transformation.
A natural join transformation would make the join on all the matching column names and inject all the non matching columns.
It would reproduce the behavior of the Join instruction of Qlik.
Used in an iterate loop, I could have an automated process to migrate datawarehouses to data catalog and be sure I don’t miss a column

Regards

1 Like

Hello Christophe,

I’m not sure if it can help, and others might have better ideas, but my initial try on this would be as follows:

  1. In a parent module, provide the two table(s) to be joined (e.g. dbo.fact_Sales and dbo.dim_Promotion)
  2. In the same parent module, fetch the fieldnames of each of these tables (e.g. if it’s about database tables, you might use the sys information schema)
  3. Identify which fieldnames appear more than once (e.g. grouped by fieldname, count and concatenate the tables) and filter on these fields - lets call them joinfield_names
  4. Split the concatenation of tables into distinct rows, and aggregate again - this time grouping by table_name and concatenating the joinfield_names
  5. Process the concatenation of joinfield_names so that you can evaluate them (using eval) as a unique join key when passing them in a submodule.
  6. Pass the table_name and joinkey computation as parameters to a submodule
  7. In the submodule, load all data of both tables separately. Compute the joinkey as a new field on both side using eval(). Then join on this joinkey field, and drop this field.
  8. Do this join on both sides and append both result tables.

I have sketched it out in a small .morph project (simplified into a single module), but in doing this I realize:

  • this approach would become quite more complex if you want to join more than two tables at once. I wouldn’t know how to do this in EasyMorph.
  • I’m not sure how to make the ‘Merge’ dynamic in terms of the attribute fields, which will be different for each table set. In the example, the joining of the ‘Description’ field is hard-coded.
  • Because of these limitations, I would tend to do the submodule work (loading the tables and joining them with all attributes) in a SQL Command action

KR

David
idea_for_Natural join switch option.morph (14.4 KB)

Note that, given this is a feature request, the addition of a ‘natural join’ option would indeed be great (to avoid the logic I describe above).
In this same regard, it would be great if the Merge action was expanded by:

  • allowing a switch to select all fields as fields to join (instead of a checkbox for each specific field)
  • allowing an option to automatically join based on equal field names
  • allowing a switch to join regardless of data type (making that text(1) and numeric(1) would be joined - in the current version, the join keys need to have the same data type in both tables)
  • allowing a switch to explicitly allow/disallow joining on empty fields (in the current version, this join is always allowed - so empty values in the join fields are treated as a match)

Sincere apologies for hijacking this post, just sharing some thoughts!

1 Like

David,

Thanks for post, it’s always valuable to go deep in this kind of thoughts.
Indeed, I’m just looking for a simple way to reproduce the Qlik join instruction.
So it would be based on equal field names.
I’ve built a workaround by generating SQL scripts with EM but this is quite cumbersome.
I’d like it to be a specific transformation and not some options to activate in the existing Join transformation.

Regards

Interesting! Never thought of such a use case. Effectively, it’s about being able to dynamically define fields to join.

2 Likes

The capabilities to work in “dynamic” mode with EM are very powerful but 3 transformations can’t be used (as far as I can see) : the Join, the Unpivot and the keep/remove matching. for the Unpivot I would need a switch between “pivot columns except” (as it is now) and “pivot only these columns”. For the keep/remove, I often use it before a Join to make it an “inner” join.
Dynamic mode is rarely possible on ETL tools as they are column based (you design the transformation flow by column). EM is table based, you can create transformation without knowing in advance the table profil.
Maybe you have here a good subject for a linkedin post ?

Regards

UPDATE

The “Natural merge” action is confirmed for v5.6 and is planned for release in a couple of weeks. It will have 4 modes:

  • Lookup
  • Inner join
  • Left join
  • Full join
4 Likes

Hi,

Thanks, it's very powerful and time saving. I love it.

Regards