Why does export create duplicate rows?

This has been torturing me for the whole afternoon.
I am using EasyMorph to importing data from spreadsheet and export it to Redshift cluster.
After bunch of calculation, now the data is ready to export.
So I did:

  1. delete matching database rows, based on three keys.
    DateKey, TagId, Series
  2. Export Data.

Every time it will create duplicate rows - double up the records that are exported.
I have ran step 1 by it’s self, and checked to make sure it works.
Doesn’t matter how I do it - Even clean up the records in the database manually, then go to EasyMorph do -> “reload and run” the whole thing, still creating duplicate rows.

Need help!

Hi Tony,

does your DateKey field contain only dates without times? Or it has times too?

does the problem appear only when “Export to database” is used after “Delete matching database rows”? Doesn’t it work correctly if you just do the export alone?

Also, any additional information would be helpful - at least screenshots, and the version of EasyMorph you’re using.

Thanks

Hi dgudkov,

It only contains Date, no time.
I was going to upload picture, but I cannot find the button to upload the picture when I edit the post.
I just find the upload button.

And… I just find that it is not a EasyMorph issue. The issue comes from the joint table.
Let me put everything together in next reply.

What happened is, I use a view which select the records by join with another table (Tag) on it’s primary key. That view has been used for long times so I never doubt it. It happen to be those tags which I use to join with the table that export to, have duplicated records. So that also explained why every time I check other records with the same view do not give me the duplicate rows.

Redshift DO NOT check the primary key, as per their document.

[https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-defining-constraints.html

The Tag Table, which contains duplicate records, is imported from another EasyMorph project. It seems that, If I run the import action twice (NOT reload and run the whole thing), it will generate duplicate rows. And it seems that, the delete rows action will only run once. I guess I probably run the export action twice in the past, when importing those tags.

image

That is the whole story.

Question: Is there a way to enforce to run the whole dependent actions (not all the actions in the project)?
In the picture, when I run the last action, I hope there is another button says ‘reload and run this action’ to ‘enforce’ any related action to run no matter what. Just like ‘reload and run’ but only for the parent actions.
image

Thank you for the detailed explanation.As I understand, the export itself works correctly however there is a bit of confusion how actions in EasyMorph are calculated.

If an action in EasyMorph is recalculated it causes all the consequent (dependent) actions to invalidate (reset) their results. However, as of now there is no way an action can “know” to invalidate a preceding action - this has to be done by the user. In EasyMorph a flow of action is basically a “time machine” – it depends on the user how far back in time they decide to go. This is especially important for the so called side effect actions - actions that affect external systems and data and where EasyMorph doesn’t fully control the state of data and what can happen in between actions.

Maybe, revisit this tutorial chapter: More ways to run a project

Should you have any questions regarding running actions in EasyMorph let me know.

Oh maybe I did not make it so clear.
What I mean is, ‘Invalid’ any preceding action, no matter what. Just like a ‘reload and run’ but only in preceding actions.
Maybe my project is too big and I should separate them into several small projects, so that I can reload only certain table?

Hmm… it’s not possible at this point to invalidate all preceding actions at once. What you can do is invalidate actions starting from a particular action. Just go to the first action in your workflow and discard its result (right-click -> Discard result).

If you switch off “Auto-run” then you will be able to Ctrl+click an action to calculate only actions up to the clicked one. See below:

demo

Thanks dgudkov,

I will discard the result from the very first action, so it should be safe now.