Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
sergsyb
Contributor III
Contributor III

How are changes which were made on Source during the initial Full Load will be applied on Target?

How are changes which were made on Source during the initial Full Load will be applied on Target?

If the initial synchronization of tables goes sequentially table by table during several hours for example, then a problem may arise when the first reloaded table contains only old data, but subsequent unloaded tables may contain both old data as well as NEWLY changed / inserted data. How such changes will be processed to the Target in order to maintain data integrity?

Labels (2)
2 Solutions

Accepted Solutions
john_wang
Support
Support

Hello @sergsyb ,

If I understood correctly, you are concerning how Replicate handle the newly changes during the Full Load stage.

In Replicate, these newly created changes will be cached during the Full Load stage, detailed information can be found Full Load and CDC processes . Replicate will apply the changes automatically after the Full Load stage done.

However, you may STOP the task after Full Load done (to reserve some time for additional maintenance etc) and then decide how to process the cached changes:

john_wang_0-1659432912999.png

 

Then you may RESUME the task (to apply the cached changes after maintenance or additional operations). The detailed info Stopping the Task after Full Load 

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

john_wang
Support
Support

Hello @sergsyb ,

Good question!

There is chance the scenario happens: the 'newly' inserted row will be replicated twice during the short time, however it's a logical concept it's not Replicate defect, just because the same row appears in both 'snapshot' and 'cached changes'. Any 3rd party tool or Apps will face the same behavior.

In Qlik Replicate, the scenario can be handled by:

1. The PK violation will prevent the duplicate rows replicate to target table (this is done by target DB)

2. If the table does not have PK, or the database itself cannot forcedly prevent duplicate rows (even has PK) then the rows will be duplicate.

3. If the PK violation occurs, Replicate will switch from 'Batch Apply' mode to 'one-by-one' mode automatically to try to apply these INSERT operations DML. 

    If the apply failed still in one-by-one mode then the rows will be written into attrep_apply_exception table. DBA may check the table and manually handle the rows.

BTW, these additional insertion operations can  only happen in short time after the Full Load done. After some time the source and target able should be synchronized.

Let me know if you need any additional information.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

5 Replies
john_wang
Support
Support

Hello @sergsyb ,

If I understood correctly, you are concerning how Replicate handle the newly changes during the Full Load stage.

In Replicate, these newly created changes will be cached during the Full Load stage, detailed information can be found Full Load and CDC processes . Replicate will apply the changes automatically after the Full Load stage done.

However, you may STOP the task after Full Load done (to reserve some time for additional maintenance etc) and then decide how to process the cached changes:

john_wang_0-1659432912999.png

 

Then you may RESUME the task (to apply the cached changes after maintenance or additional operations). The detailed info Stopping the Task after Full Load 

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
sergsyb
Contributor III
Contributor III
Author

John thanks for reply.

The following extracts from documentations bothers me

CDC captures and applies the changes as units of single committed transactions and can update several different target tables as the result of a single source commit.

Let's look at an example. There are 3 tables which should be synchronized during Full Load

  1. Full Load started with loading table tab1
  2. Table tab1 was successfully loaded to the target
  3. During populating the next table (tab2) transaction occurred on Source, which inserted rows in tab1 and tab3

begin tran

insert into tab1 values (…….)

insert into tab3 values (…….)

commit

 

  1. The last table tab3 was loaded (with new row inserted in previously step) to the target
  2. After full load, CDC will try to execute transaction occurred while Full Load

 

After loading above named transaction will insert row in tab1 but get error re duplicate in tab3, and as result entire transaction will rollback on target, which lead to data inconsistency Am I right?  

john_wang
Support
Support

Hello @sergsyb ,

Good question!

There is chance the scenario happens: the 'newly' inserted row will be replicated twice during the short time, however it's a logical concept it's not Replicate defect, just because the same row appears in both 'snapshot' and 'cached changes'. Any 3rd party tool or Apps will face the same behavior.

In Qlik Replicate, the scenario can be handled by:

1. The PK violation will prevent the duplicate rows replicate to target table (this is done by target DB)

2. If the table does not have PK, or the database itself cannot forcedly prevent duplicate rows (even has PK) then the rows will be duplicate.

3. If the PK violation occurs, Replicate will switch from 'Batch Apply' mode to 'one-by-one' mode automatically to try to apply these INSERT operations DML. 

    If the apply failed still in one-by-one mode then the rows will be written into attrep_apply_exception table. DBA may check the table and manually handle the rows.

BTW, these additional insertion operations can  only happen in short time after the Full Load done. After some time the source and target able should be synchronized.

Let me know if you need any additional information.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
sergsyb
Contributor III
Contributor III
Author

Thanks John, it is now clear..

By the way, I think it make sense to mention such behavior in documentation, because now it states follows

At the conclusion of the load process, however, Replicate guarantees consistency and integrity of the target data.

 

john_wang
Support
Support

Hello @sergsyb ,

thank you so much for your great insight, you are Replicate expert 🙂

Replicate user guide mentions the above behavior in different chapters, if you search the keyword "cache" then you may get a few of relevant description.

Best Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!