Solved: Replicating huge databases - Qlik Community

guilherme-matte · ‎2023-02-19

Hello guys!

More of a general question today..

Does Qlik Replicate have issues replicating HUGE databases? We have a client who wants to replicate one db that they even have issues querying due its massive size on premises.

Does Replicate have a limit in this regard or we should be fine?

Thank you as always for the collaboration!

Kind regards!

john_wang · ‎2023-02-20

Hello @guilherme-matte ,

Thanks for you reaching out.

Not sure what's the source/target database type and Replicate version, in general the suggestions are:

1. We may utilize filters to 'cut' the massive size data from source to smaller size to transfer to target side; eg each part contains 1 year history data only. or (2)

2. Using Parallel Load to speed up the transfer if network bandwidth is sufficient.

3. Run Full Load ONLY task to pass the unchanged old data to target by dedicated task(s), or by different filter conditions in the single task;

4. We may transfer unchanged data to different temporary table(s) in target in parallel, then merge the temporary table(s) into target table in off-peak time, before the CDC task startup. Transfer the 'history' data prior to 'change' data to avoid the UPDATE/DELETE cannot find the target rows in target side database/tables.

5. If possible please use partition tables (eg 1 partition for 1 year data) in target DB for easy management and better performance, also it has Primary Key/Unique Index/Unique Key etc to make sure no full table scanning during CDC stage otherwise latency builds up.

6. We'd like to suggest PS team engaged as this not an easy performance tuning job, and need to solve the various issues during the long time running.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

john_wang · ‎2023-02-20

Hello @guilherme-matte ,

Thanks for you reaching out.

Not sure what's the source/target database type and Replicate version, in general the suggestions are:

1. We may utilize filters to 'cut' the massive size data from source to smaller size to transfer to target side; eg each part contains 1 year history data only. or (2)

2. Using Parallel Load to speed up the transfer if network bandwidth is sufficient.

3. Run Full Load ONLY task to pass the unchanged old data to target by dedicated task(s), or by different filter conditions in the single task;

4. We may transfer unchanged data to different temporary table(s) in target in parallel, then merge the temporary table(s) into target table in off-peak time, before the CDC task startup. Transfer the 'history' data prior to 'change' data to avoid the UPDATE/DELETE cannot find the target rows in target side database/tables.

5. If possible please use partition tables (eg 1 partition for 1 year data) in target DB for easy management and better performance, also it has Primary Key/Unique Index/Unique Key etc to make sure no full table scanning during CDC stage otherwise latency builds up.

6. We'd like to suggest PS team engaged as this not an easy performance tuning job, and need to solve the various issues during the long time running.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

guilherme-matte · ‎2023-02-20

Hello John!

Thank you as always for your help.

I will get more information about the steps you mentioned (parallel load, partitions, etc...) and also consider engaging the PS team as well after I get some more info.

In general, what would Qlik be able to handle without many issues? I mean, what would be considered a really big database that should require some extra tunning and usually which are the sizes that Qlik would handle without many problems? I know it might depend on extra factors, but its just to have an idea since HUGE databases, as in the question, is a bit subjective.

Cheers!

john_wang · ‎2023-02-21

Hello @guilherme-matte ,

In general we need a configuration tuning if the data is huge however it's hard to tell a number , that depends on hardware/OS settings/network throughputs/database type etc factors, just as you said.

Best Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Replicating huge databases

Functionality

General Question