Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
EDJ
Contributor II
Contributor II

Duplicate records with Full Load and Store Changes on a task

Is it expected to see the same record in the output of both the Full Load (Loadxxxx.csv) and Change Table (date-time.csv) when an insert occurs during the Full Load process.

So the scenario is as follows:

Task is configured with both Full Load and Store Changes configured.  Apply Changes is not enabled.  The goal is to generate a baseline of the source table, then perform analytics against the 

00:00 Task with Full Load and Store Changes is started.
00:15  Full Load is in Progress
00:16  Insert to table being loaded occurs.

The end result is that the new record which was inserted while Full Load was in process exists in both the Full Load files and the Change Table files.  Is this expected and is there a way to prevent Store Changes from logging entries which were captured during the Full Load process?

Labels (1)
1 Solution

Accepted Solutions
Heinvandenheuvel
Specialist III
Specialist III

Yes that's expected.

The full-load is effectively a simple SELECT * FROM <table> ORDER BY PK 

Whether or not that selects sees a row inserted after the start depends  on the isolation level and specifically 'phantom reads' which depends on the source database types and settings.

(see for example https://en.wikipedia.org/wiki/Isolation_(database_systems) or https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/transaction-isolation-levels?view=sq...  or ) 

Replicate starts CDC for a table before starting the source unload and it 'caches' those changed to be applied after unload and its target load finish. (btw - these steps can clearly be seen in the reptask log. Increase logging level for SOURCE_UNLOAD and TARGET_LOAD to TRACE as desired )

When a task uses APPLY CHANGES it will apply those cached changes (which happened from committed transactions during the select) after the target load completes. If an insert in this context creates a duplicate error it is ignored. After the table load is completed a duplicate insert  would trigger an 'apply error' and is handled according to task, or default global, error handling rules.

When building change tables (CT), that duplicate handling is note (can not) be done.

Hein

 

 

View solution in original post

1 Reply
Heinvandenheuvel
Specialist III
Specialist III

Yes that's expected.

The full-load is effectively a simple SELECT * FROM <table> ORDER BY PK 

Whether or not that selects sees a row inserted after the start depends  on the isolation level and specifically 'phantom reads' which depends on the source database types and settings.

(see for example https://en.wikipedia.org/wiki/Isolation_(database_systems) or https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/transaction-isolation-levels?view=sq...  or ) 

Replicate starts CDC for a table before starting the source unload and it 'caches' those changed to be applied after unload and its target load finish. (btw - these steps can clearly be seen in the reptask log. Increase logging level for SOURCE_UNLOAD and TARGET_LOAD to TRACE as desired )

When a task uses APPLY CHANGES it will apply those cached changes (which happened from committed transactions during the select) after the target load completes. If an insert in this context creates a duplicate error it is ignored. After the table load is completed a duplicate insert  would trigger an 'apply error' and is handled according to task, or default global, error handling rules.

When building change tables (CT), that duplicate handling is note (can not) be done.

Hein