Fuzzy join of two tables

Some ETL-tools have a fuzzy join transformation making it possible to join two tables if the values in the joining columns are not exactly the same.

Is there a transformation in EasyMorph that does the same thing ? If not, is this transformation on the roadmap ?

Do fuzzy matching first (using the “Match” action in fuzzy matching mode), then a regular join using the “Merge” action. It’s the same as fuzzy join.

Hi dgudkov, sorry for necroing but is the fuzzy mode usable as a function in other expressions? It would be the same as “distance”, right? Do you have examples of using distance to parse different columns/tables also with adjoining text functions and conditions? Thank you!

Yes, that's correct. The distance() function uses the same algorithm as the fuzzy matching mode.

Can you provide an example or be a bit more specific?


Is there a way to do a fuzzy LEFT JOIN ?

Example below. In this case the first match is returned. How can I return the two records from the lookup table?

Do fuzzy matching the other way around - place the "Match" action in "Table 1". You may need to run iterations to collect all matches for all values in a dataset.

Or another method could be:

  • Iterate over the lookup values and for each lookup value, iterate over the lookup dataset. Calculate distance(lookupvalue, value).
  • Include in the resultset if result of distance <= treshold.

Something like that ?

I don't think you need two iterations. Just one "Iterate with table" (the table is the lookup dataset) and use fuzzy matching in the iterated module.