Dedeuplicating for groups based on multiple columns

David369 · July 2, 2022, 9:55pm

Dear Community,

I can't seem to be finding the right solution, or I might have overseen a simple solution.
I have a table of lab experiment results with repeating messages identifying a certain action which occur once per trial and subject (multiple trials per subject). The problem is, that some of these action messages repeat as duplicates within the range of a few rows (repetition is not regular) and these duplicates need to be removed, however leaving the messages from other trials and subjects remaining in the table. Here an example table:

The messages are identical for each subject and trial, so I would need to remove duplicates of the message when within the same trial and within the same subject such that for instance message A only occurs once for subj 1 in trial 1. Further the problem is, that duplicates occur for all messages once in a while, not oly for A, but sometimes for B and C as well.

Would anyone know a possible solution, or could point me in the right direction?
Any advice is greatly appreciated.
Greetings

PS I started using Easy Morph fairly recently and so far I am extremely satisfied with its ease of use and smooth handling of large datasets.``

dgudkov · July 3, 2022, 9:00am

Hi David and welcome to the Community!

It looks like the "Deduplicate" action does exactly what you need unless I'm missing something.

David369 · July 3, 2022, 12:48pm

Dear Dmitry,

thank you for your fast reply and the warm welcome.

Unfortunately, the deduplicate function does not solve the problem, though maybe I am using it wrong?
I see that my description was missing a critical point, sorry for that. The doubled value e.g. "A", is an erronoeus double/copy to the value in the cell above, but thee rest of the row has different values than the row above, making it unique. If I specify the "selected" columns in the deduplicate action, then all the empty cells in between areremoved, moving everything out of place. Am I using the deduplicate function wrong? Attached is a screenshot, where in the colum "Message_2" the second value of "START..." needs to be deleted.

Thank you in advance for your help!

dgudkov · July 3, 2022, 8:05pm

Can you attach a spreadsheet with two sheets - data before deduplication and data that should be after deduplication?

David369 · July 3, 2022, 9:54pm

Please find attached an excerpt of the data as xlsx file with the two sheets. The column of interest is "Message_2" and the first example of a duplicate is found on line 204. For every combination of subject and trial the "message" is identical, so that only duplicates within each subject/trial combination need removal.

Thanks for having a look!
deduplication_example_col_message_2.xlsx (2.4 MB)

dgudkov · July 4, 2022, 6:55am

You need to count the number of repeating messages in groups where each group is a group of rows with identical values in particular columns. It will give you a count for repeating messages. Remove messages when count > 1.

See the example below:
cleanup-repeating.morph (3.4 KB)

Book1.xlsx (8.6 KB)

David369 · July 4, 2022, 10:14pm

Dear Dmitry,

thank you very much for your advice and the demo project! This indeed was the solution, that I was looking for.

Best regards

dgudkov · July 5, 2022, 7:39pm

You're welcome