Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Team,
We are facing an issue with missing records in the target and would like your guidance.
Scenario:
Configuration:
Full Load passthrough filter:
date_column = CURRENT_DATE - 1Issue:
Hello @shyamkatika ,
When your job runs at 8 AM today, it pulls everything timestamped Yesterday. If a record for Yesterday is delayed in a source pipeline and finally hits the source table at 10 AM Today, your tomorrow run won't see it (because tomorrow's run looks for today's data), and your today's run has already finished.
Instead of fetching exactly one day, fetch a lookback window (e.g., the last 2 or 3 days)
CURRENT_DATE - 2 or CURRENT_DATE - 3
Since you are using "Do nothing" (Append), this will create duplicates every single day
If you can influence the source or have a metadata column, don't filter by date_column. Instead, use a load_timestamp (the time the record actually hit the source table).
You can filter load_timestamp >= (Time of last successful run)
This captures records based on when they arrived, regardless of what calendar date is written in their date_column.
Hope this helps.
Regards,
Sachin B
Hello @shyamkatika ,
When your job runs at 8 AM today, it pulls everything timestamped Yesterday. If a record for Yesterday is delayed in a source pipeline and finally hits the source table at 10 AM Today, your tomorrow run won't see it (because tomorrow's run looks for today's data), and your today's run has already finished.
Instead of fetching exactly one day, fetch a lookback window (e.g., the last 2 or 3 days)
CURRENT_DATE - 2 or CURRENT_DATE - 3
Since you are using "Do nothing" (Append), this will create duplicates every single day
If you can influence the source or have a metadata column, don't filter by date_column. Instead, use a load_timestamp (the time the record actually hit the source table).
You can filter load_timestamp >= (Time of last successful run)
This captures records based on when they arrived, regardless of what calendar date is written in their date_column.
Hope this helps.
Regards,
Sachin B
Hi @SachinB ,
Thanks for your response,
For late-arriving records on the source side, the expectation is that they would be picked up in the next scheduled run. However, these records are not being captured in the target at all. We have validated this behavior with data from the past 3 months.
Thanks,
Shyam Sundar.
Hello @shyamkatika ,
The reason those records haven't been captured for the last 3 months is that your filter logic creates a "blind spot." In your current setup, each day is a one-time-only opportunity for a record to be moved. If a record isn't there at 8:00:00 AM, it misses the bus, and the next bus (the next day) doesn't go back to pick it up.
You have to apply or implement the logic that we have suggested earlier.
Let us know if you need any additional information.
Regards,
Sachin B