Coalesce/Nested Merge Criteria

Perk · December 2, 2024, 9:09am

This is not really a feature request as much of as a hmmm, what do you all think? Would something like this be going to far?

Basically what I am thinking is of the merge/join actions, where the matching criteria are set between the two tables, what if you could have a second or third set of rules for the match to perform on in the event the first criteria fails? Kind of like how we set up the conditions on the rules action but those conditions instead end up being the join criteria.

Currently in Easymorph, you would need to perform a second merge action to get the records from another table that didn't match in the other table. And that would bring a second copy of all the columns over if you had successes after the first merge attempt. Then you have to do a bunch of cleanup on the columns to normalize back to just one column per joined attribute. I guess alternatively, you could do some filtering on scenarios that are simple, like empty cells that are kind of like attributes of the composite key you are making for the regular merge criteria, but things get squirrelly really fast.

Here is an explanation of the use case I have for such a feature. My company has a CRM system that is used to log customer orders but the data entry is done by various stakeholders and business units. There are silos of entry method between for different business units as well as business unit customer combinations. Sometimes a PO number field will be input without the line id. Sometimes the customer's tool id name is entered in differently than what the customer calls it but the PO is correct. I've tried normalizing for different methods of "manual key value input" but it only goes so far. For example, I did cleanup on various methods in which PO values are input and came up with 42 different rules. Someone is always going to come up with a new way to enter something that hasn't been done before and break a cleanup rule. But, there is a decent chance among several different matching criteria I can back into a combination that will give me a proper join.

Anyway, sorry if this idea is way out there. I was just brainstorming a bit while working through some reporting I am doing for sales.

Also, the new profile feature and the merge action is amazing. I am really loving it so far. I am actually going to use the visual it provides in a meeting to encourage some operational behavior changes. I have built error type tables to isolate join failures etc before but it never conveys well to the business folks. Now with this tool I can take a screenshot and have a proper conversation about it visual with colors that operations managers will understand. And the added bonus of being able to pop out the records into a sandbox is icing on the cake.

As always, thanks Easymorph team!

Best wishes,

Perk

dgudkov · December 4, 2024, 12:00am

That's an interesting question.

If matching by at least one key column (out of N) is considered reliable, then unpivoting keys can be a simple way to do matching (and merge), as in this case we reduce matching to strict two-column matching. See the example below.

sparse-merge.morph (5.4 KB)
Sparsely indexed tables.xlsx (9.5 KB)

But if at least K columns must match out of N and K>1, that becomes less trivial.

Perhaps, for such cases we do need a special action - "Sparse merge" or something.

Thoughts?

Perk · December 4, 2024, 9:14am

Hi Dmitry,

Thank you kindly for the reply!
That is an interesting method you are showing. I am a little worried about using that because you end up with an additional row when the merge is successful on more than one criteria. And then it would require duplicate handling. That may not necessarily be terrible. Also, in the unpivot, it feels manageable in the example, but I am typically joining very fairly large table, sometimes 5o to several hundred columns long. And managing what column to exclude from the unpivot might feel overwhelming. I will play with it a bit. Thank you for the idea.

I think I may not have explained that well initially. Please let me know if this example helps clarify what I am trying to achieve.

The top table is similar to what we get from the customer. The lower table represents the opportunities our company would make in response to the customer forecast.

Assumptions:

Customer GUID/Opportunity GUID: There is a workflow that can be done manually and if it is performed creates a tightly bound key in the form of a GUID between the two data sets. this is the most optimal merge criteria.

Customer PO Number/Opportunity PO Number: This would be the second most preferable merge criteria. There is typically very little change after it is issued. However, operations/sales folks perform different behavior with the system data entry. Sometimes is is perfect but other time similar behavior as what is in "Opportunity PO Number" can be seen.

Customer Tool ID/Opportunity Tool ID: This is the third most preferable join criteria. And actually, it can't be used alone, and needs to merge on the combination of the Tool ID and the Tool Category. A single Tool ID can have a category of MT or PK. The PK stands for prefacility kit and MT for main tool. A single tool id will typically have a MT and PK record. Additionally, this value can change in relationship to the Customer PK value.

In a single merge action, my idea is as follows:

First: Attempt to merge the Customer table to Opportunity on the Opportunity GUID. This would work for rows (1,2,3,4).

Any failing records would go to criteria two.

Second: Attempt to merge the Customer table to the Opportunity table on the PO number match. This should be successful for rows (5,7,10,11) but would fail on (6,8,912,13,14,15)

After this check go to merge criteria three.

Third: Attempt to merge the Customer table to the Opportunity table on the combination of Tool ID and Category. This should be successful for rows (6,8,12,13,14) but fail on (9,15)

Rows 9 and 15 would just fail and not bring any columns back just how a merge failure on the setup criteria works now.

MultipleMergeCriteria.morph (4.6 KB)

MultipleMergeCriteria.xlsx (12.7 KB)