Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
shyamkatika
Contributor III
Contributor III

Missing records in daily snapshot load (Full Load with Current Day -1 filter)

Hi Team,

We are facing an issue with missing records in the target and would like your guidance.

Scenario:

  • Source table has huge volume of data
  • The table does not have a Primary Key
  • Hence, we are using a Full Load only task (no CDC)
  • We are maintaining daily snapshots
  • The job is scheduled to run every day at 8 AM

Configuration:

  • Full Load passthrough filter:

    date_column = CURRENT_DATE - 1
  • Task setting:
    • Do nothing (no truncate/drop)

Issue:

  • We are observing missing records in the target
  • It seems that some records for “yesterday” are getting inserted into the source after 8 AM
  • Since our filter strictly picks CURRENT_DATE - 1, those late-arriving records are not captured in subsequent runs.

 

Labels (1)
1 Solution

Accepted Solutions
SachinB
Support
Support

Hello @shyamkatika ,

When your job runs at 8 AM today, it pulls everything timestamped Yesterday. If a record for Yesterday is delayed in a source pipeline and finally hits the source table at 10 AM Today, your tomorrow run won't see it (because tomorrow's run looks for today's data), and your today's run has already finished.

Instead of fetching exactly one day, fetch a lookback window (e.g., the last 2 or 3 days)
CURRENT_DATE - 2 or CURRENT_DATE - 3

Since you are using "Do nothing" (Append), this will create duplicates every single day

If you can influence the source or have a metadata column, don't filter by date_column. Instead, use a load_timestamp (the time the record actually hit the source table).

You can filter load_timestamp >= (Time of last successful run)

This captures records based on when they arrived, regardless of what calendar date is written in their date_column.


Hope this helps.

 

Regards,

Sachin B

 

View solution in original post

3 Replies
SachinB
Support
Support

Hello @shyamkatika ,

When your job runs at 8 AM today, it pulls everything timestamped Yesterday. If a record for Yesterday is delayed in a source pipeline and finally hits the source table at 10 AM Today, your tomorrow run won't see it (because tomorrow's run looks for today's data), and your today's run has already finished.

Instead of fetching exactly one day, fetch a lookback window (e.g., the last 2 or 3 days)
CURRENT_DATE - 2 or CURRENT_DATE - 3

Since you are using "Do nothing" (Append), this will create duplicates every single day

If you can influence the source or have a metadata column, don't filter by date_column. Instead, use a load_timestamp (the time the record actually hit the source table).

You can filter load_timestamp >= (Time of last successful run)

This captures records based on when they arrived, regardless of what calendar date is written in their date_column.


Hope this helps.

 

Regards,

Sachin B

 

shyamkatika
Contributor III
Contributor III
Author

Hi @SachinB ,

Thanks for your response,

For late-arriving records on the source side, the expectation is that they would be picked up in the next scheduled run. However, these records are not being captured in the target at all. We have validated this behavior with data from the past 3 months.

Thanks,

Shyam Sundar.

SachinB
Support
Support

Hello @shyamkatika ,

The reason those records haven't been captured for the last 3 months is that your filter logic creates a "blind spot." In your current setup, each day is a one-time-only opportunity for a record to be moved. If a record isn't there at 8:00:00 AM, it misses the bus, and the next bus (the next day) doesn't go back to pick it up.

You have to apply or implement the logic that we have suggested earlier.

Let us know if you need any additional information.

Regards,

Sachin B