Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
guilherme-matte
Partner - Creator
Partner - Creator

Initial Full Load impact on Source

Hello colleagues!

During an initial full load of a big database, which are the impacts of Replicate on the source?

Are there any concerns that should be raised? Should i opt to do it outside of working hours or usually this kind of impact of negligible? 

Cheers!

Labels (2)
1 Solution

Accepted Solutions
john_wang
Support
Support

Hello @guilherme-matte ,

Thanks for reaching out!

For a Full Load + CDC enabled task, there are 2 operations are performing inside Replicate task from the point of its running:

1- Query the source database tables for Full Load stage ('history' data)

2- Caching up the relevant tables' new change rows for CDC stage ('delta' data). 

The operation (2) is done automatically by default (can be customed), all changes will be hold in memory or harddisk (if it's too big to keep in memory). These data will apply to target after the Full Load stage done.

For operation (1), We may tune the FL performance via task setting "Maximum number of tables" (default 5), see below sample. Larger value means FL faster (if network throughput is enough), certainly more impact on source database, mainly on I/O and Network resoruces. We may reduce the impact by lower parallel tables number however it means longer FL during time. Take note that the longer FL stage last the more change data will be cached, sometimes source database does not allow too much cached changes then lead error eg in Oracle database the famous error ORA-01555: Snapshot too old.

In short, the FL performance can be controlled (besides above parameters, filter or Parallel Load can be used as well), it's better to do Full Load in a non-peak hours.

john_wang_0-1676856505581.png

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

2 Replies
john_wang
Support
Support

Hello @guilherme-matte ,

Thanks for reaching out!

For a Full Load + CDC enabled task, there are 2 operations are performing inside Replicate task from the point of its running:

1- Query the source database tables for Full Load stage ('history' data)

2- Caching up the relevant tables' new change rows for CDC stage ('delta' data). 

The operation (2) is done automatically by default (can be customed), all changes will be hold in memory or harddisk (if it's too big to keep in memory). These data will apply to target after the Full Load stage done.

For operation (1), We may tune the FL performance via task setting "Maximum number of tables" (default 5), see below sample. Larger value means FL faster (if network throughput is enough), certainly more impact on source database, mainly on I/O and Network resoruces. We may reduce the impact by lower parallel tables number however it means longer FL during time. Take note that the longer FL stage last the more change data will be cached, sometimes source database does not allow too much cached changes then lead error eg in Oracle database the famous error ORA-01555: Snapshot too old.

In short, the FL performance can be controlled (besides above parameters, filter or Parallel Load can be used as well), it's better to do Full Load in a non-peak hours.

john_wang_0-1676856505581.png

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
guilherme-matte
Partner - Creator
Partner - Creator
Author

Perfect explanation! 

Thank you John

Cheers,