Solved: Initial Full Load impact on Source - Qlik Community

guilherme-matte · ‎2023-02-19

Hello colleagues!

During an initial full load of a big database, which are the impacts of Replicate on the source?

Are there any concerns that should be raised? Should i opt to do it outside of working hours or usually this kind of impact of negligible?

Cheers!

john_wang · ‎2023-02-19

Hello @guilherme-matte ,

Thanks for reaching out!

For a Full Load + CDC enabled task, there are 2 operations are performing inside Replicate task from the point of its running:

1- Query the source database tables for Full Load stage ('history' data)

2- Caching up the relevant tables' new change rows for CDC stage ('delta' data).

The operation (2) is done automatically by default (can be customed), all changes will be hold in memory or harddisk (if it's too big to keep in memory). These data will apply to target after the Full Load stage done.

For operation (1), We may tune the FL performance via task setting "Maximum number of tables" (default 5), see below sample. Larger value means FL faster (if network throughput is enough), certainly more impact on source database, mainly on I/O and Network resoruces. We may reduce the impact by lower parallel tables number however it means longer FL during time. Take note that the longer FL stage last the more change data will be cached, sometimes source database does not allow too much cached changes then lead error eg in Oracle database the famous error ORA-01555: Snapshot too old.

In short, the FL performance can be controlled (besides above parameters, filter or Parallel Load can be used as well), it's better to do Full Load in a non-peak hours.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

john_wang · ‎2023-02-19

Hello @guilherme-matte ,

Thanks for reaching out!

For a Full Load + CDC enabled task, there are 2 operations are performing inside Replicate task from the point of its running:

1- Query the source database tables for Full Load stage ('history' data)

2- Caching up the relevant tables' new change rows for CDC stage ('delta' data).

The operation (2) is done automatically by default (can be customed), all changes will be hold in memory or harddisk (if it's too big to keep in memory). These data will apply to target after the Full Load stage done.

For operation (1), We may tune the FL performance via task setting "Maximum number of tables" (default 5), see below sample. Larger value means FL faster (if network throughput is enough), certainly more impact on source database, mainly on I/O and Network resoruces. We may reduce the impact by lower parallel tables number however it means longer FL during time. Take note that the longer FL stage last the more change data will be cached, sometimes source database does not allow too much cached changes then lead error eg in Oracle database the famous error ORA-01555: Snapshot too old.

In short, the FL performance can be controlled (besides above parameters, filter or Parallel Load can be used as well), it's better to do Full Load in a non-peak hours.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

guilherme-matte · ‎2023-02-19

Perfect explanation!

Thank you John

Cheers,

Initial Full Load impact on Source

Best Practices

General Question