Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
MarkWex
Partner - Contributor
Partner - Contributor

Ignoring Deleted Data

Hello.

Say we have 5 years of data in our source database (SQL server)and all of those records have also been propagated to Snowflake.  We would now like to delete the oldest 2 years of data in the source but not allow the DELETE statements of those records to propagate to Snowflake because we would still like to keep all 5 years in Snowflake.   

Labels (2)
2 Replies
Heinvandenheuvel
Specialist III
Specialist III

Hmm, the base question seems simple enough.  Just add a filter to the table: $AR_H_OPERATION != 'DELETE'

However, a few warnings

  1.  If you just make that edit, stop the task, save and resume, then a reload will be triggered which you do not want. So you need to use ADVANCED RUN, NOW, and 'tables already loaded'. This may create discrepancies, so choose a 'slow' time in the source DB and start a few minutes/seconds earlier to cover the stop/start gap.
  2. the delete filter might be too coarse - you could add a filter clause similar to the delete expression for example AND $LAST_UPDATE < '2020-01-01' (or similar - do try with small tables in test task!)
  3. It might be easier to REMOVE the table from the task, stop-save-start. Do the deletes. re-add the table and start as per 1) with tables are already loaded.
  4. If hundreds of millions of rows are deleted then a simple delete statement may only take seconds but it may create many gigabytes of transaction logs. Replicate may take minutes (hours) to read about each individual row to be deleted and skip them. So the simple, effective solution on a source with down time is to just stop the tasks, do your deletes wait till done plus a few seconds and resume the tasks with advanced run table already loaded with 'NOW' - skipping the deletes in the log completely.  The risk being that it will skip anything else in the log as well, so there had better not be any other update activity to replicate.
  5. and should you watch this with CDC monitor -in the UI, then don't get concerned if you see the deletes and 'total applied' zoom up. They are counted, seen but filtered. Again: TEST TEST TEST.
  6. Finally, read up on  the operation_indicator function. This will change the deletes to (null) updates allowing you to set a flag on the snowflake target indicating that the row was deleted on the source. If you go that route you need an extra clause on the filter doing something 'last-update older than 5 years ago' or '$ar_user = 'maintenance_user' to distinguish between really deleted rows and 'archived' rows. And you may need a view on the target to figure that out also.

Hein

 

mehdi7
Contributor
Contributor

I thought read/write/delete were all under the same bucket and count as the same thing. Good point I rechecked the pricing page. 0.02 per 100k as well shouldn't break the bank.