I can’t seem to be finding the right solution, or I might have overseen a simple solution.
I have a table of lab experiment results with repeating messages identifying a certain action which occur once per trial and subject (multiple trials per subject). The problem is, that some of these action messages repeat as duplicates within the range of a few rows (repetition is not regular) and these duplicates need to be removed, however leaving the messages from other trials and subjects remaining in the table. Here an example table:
The messages are identical for each subject and trial, so I would need to remove duplicates of the message when within the same trial and within the same subject such that for instance message A only occurs once for subj 1 in trial 1. Further the problem is, that duplicates occur for all messages once in a while, not oly for A, but sometimes for B and C as well.
Would anyone know a possible solution, or could point me in the right direction?
Any advice is greatly appreciated.
PS I started using Easy Morph fairly recently and so far I am extremely satisfied with its ease of use and smooth handling of large datasets.``
Hi David and welcome to the Community!
It looks like the “Deduplicate” action does exactly what you need unless I’m missing something.
thank you for your fast reply and the warm welcome.
Unfortunately, the deduplicate function does not solve the problem, though maybe I am using it wrong?
I see that my description was missing a critical point, sorry for that. The doubled value e.g. “A”, is an erronoeus double/copy to the value in the cell above, but thee rest of the row has different values than the row above, making it unique. If I specify the “selected” columns in the deduplicate action, then all the empty cells in between areremoved, moving everything out of place. Am I using the deduplicate function wrong? Attached is a screenshot, where in the colum “Message_2” the second value of “START…” needs to be deleted.
Thank you in advance for your help!
Can you attach a spreadsheet with two sheets - data before deduplication and data that should be after deduplication?
Please find attached an excerpt of the data as xlsx file with the two sheets. The column of interest is “Message_2” and the first example of a duplicate is found on line 204. For every combination of subject and trial the “message” is identical, so that only duplicates within each subject/trial combination need removal.
Thanks for having a look!
deduplication_example_col_message_2.xlsx (2.4 MB)
You need to count the number of repeating messages in groups where each group is a group of rows with identical values in particular columns. It will give you a count for repeating messages. Remove messages when count > 1.
See the example below:
cleanup-repeating.morph (3.4 KB)
Book1.xlsx (8.6 KB)
thank you very much for your advice and the demo project! This indeed was the solution, that I was looking for.