Can anyone help me better understand the Select Matching Database Rows action?
I am trying to use this for pre-filtering. I have time series data that gets loaded into a history table daily. So a certain ID will have a record in the database for every day it exists in the load file. The base table has over 3,000,000 records and I have to query it from overseas. It takes upwards of 6 minutes. If I am able to get just the max load occurrence for each ID, it is only 14k (ish) records and will significantly drop down the initial query time.
So I am looking to use the Select matching Database Rows to get the max load occurrence of each records, but I am getting an error when I try to use the action. I am trying to match the ID and Created_On columns with is a date/time column. But this action is giving me an error that it isn't an integer.
Error: Value in the [Created_On] column at row #1 is not an integer and can't be matched to a DateTime DB column
Source: action "Select matching database rows", module "1.Main", table "Max Load by ID"
Do I need to perform some kind of additional transformation of the column before I use it as matching criteria in the Select Matching Database Rows action?
If anyone has any experience or advice, I would be greatly appreciative. Thanks!!
Workaround:
add another field with timestamp in unix format (with action "tounixtime" syntax:functions:tounixtime [EasyMorph Help]) into the table and do the select matching against this field.
Thank you very much for the reply. I read through the thread you indicated in your response. In my case it happens to be a SQL server table I am querying against.
When I converted the date time value with the tounixtime function you mentioned I got another error that the value was either too long or too short. In my case, the initial query returns the time component longer than three digits after the decimal place. I split it and did a keep left 3 and concatenated back together just in case this would help, but I am still getting the same too large or short error back.
The action is intended for pre-filtering an SQL query using a list of keys in EasyMorph. Dates have different representations in EasyMorph and in SQL databases, and therefore, reliable matching can only be done on dates with a time part. Otherwise, rounding errors may lead to false mismatches.
Converting to Unix time can work since it creates an integer number, but it should match a similar column in the source database table, i.e. in the source database it should also be a column with Unix times, not a regular date/time column.
EasyMorph checks the data type of the matched column in the database. If it's a date column (in the database type system), then it assumes that the matched column in EasyMorph is also a date column (in the EasyMorph type system).
The "too big or too small" error that you see (we will fix the spelling) appears because EasyMorph checks the data type of the column "Created_On" in the database and sees it has the date type. So, it assumes that the "Created_On" column in EasyMorph is an EasyMorph date (a 5-digit number roughly around 40,000) but it encounters a huge 10-digit number that is clearly not an EasyMorph date.
Speaking of your goal, I don't clearly understand what you mean by "the max load occurrence of each records" - is it the biggest value? Or, the max count by ID? It's not clear why you need to match by both ID and created timestamp.
My apologies for the poor wording. By max load occurrence of each record, I mean the max load date/time of Each ID. The ID is the primary key provided by the customer. But we essentially load a report that is generated daily into a history table. Most of the attributes are slowly changing. So most of the time there is no change from one day to the next. The only unique "key" to the table I have is the combination of the ID with the Date/time stamp. I typically want the max date/time the record loaded in because that represents the most up to date final state of the record. The process runs daily, so the time component is actually not really important for me at all. That just happens to be the way the table was built initially.
For example, ID 49125 has loaded into the table 210 times.
I was hoping to use the select matching database rows action to just get the max load date records for each ID. In the case of the above example, would be 2020-10-19.
Thank you for the explanation. It sounds like if I were to use a date field for criteria matching on this it would need to be an integer date with no time component in my source table.
I was hoping to cut some time with a faster query. But I don't have a ram issue and it is easy enough to just bring in the entire table and use the Keep max action on by date/time column grouped by the ID.