multiple Qlik tasks

pasquij · ‎2023-12-04

Is there any advantage to using multiple tasks vs using 1 task for all tables we want to replicate? Some clients have a batch process that would insert/update ~300,000 records per minute. Should we use a separate task for that specific table? If yes, how would that help?

SachinB · ‎2023-12-04

Hello @pasquij ,

Thank you for reaching out Qlik Community !

You can define and activate several replication tasks at once. This is best if the tasks:

Have different source tables.
Share some source tables but have different filtering conditions on the source rows.
Update different target tables.

Updating the same target table and row by two different replication tasks would not be good practice and may cause unpredictable results.

The different replication tasks work independently and run concurrently. Each has its own Initial Load, CDC, and Log Reader processes.

However, if you want to separate the task based on mass updates at source for a particular table. As it reduces burdens on load/latency of the main task which are replicating less data compared to these.

But keep in mind it increases number of connections at source side.

Hope this helps.

You can also explore on logstream task : https://community.qlik.com/t5/Official-Support-Articles/LogStream-Setup/ta-p/1743657

Regards,

Sachin B

john_wang · ‎2023-12-04

Hello @pasquij ,

Qlik Replicate is a log-based CDC product. One task will read and parse the changes from the transaction log once (TLOG in SQL Server, Redo LOG in Oracle, Journal in DB2i, BinLog in MySQL, OPLOG in MongoDB ...), multiple tasks (which connects to the same source DB) will read the same transaction log multiple times which may impact the source DB and waste the IO/network resources. So using the minimum number of tasks is highly recommended.

Regarding your scenario, several separate tasks will read the same transaction log more than once. if possible, please merge the tables into one or two task(s), or introduce the Log Stream to reduce the transaction log reading times to eliminate the overhead of reading the logs for each target separately.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2023-12-04

'it depends'

What triggered the question? Was there an observed (latency) problem already (how bad) or it this a general design concern.

Why not run it as a test in a DEV or QA environment? Or indeed in Prod, but carefully document, and monitor. Also pre-alert the consumers you hope it will be faster and that their change may come out of order but they will all be there.

You didn't specify source nor target. The primary overhead large transaction is reading through the log to collect all the data. Well, if a task is done reading a million+ changes for a 'bad' table, it might as just well read on and deal with all tables. This is specifically the case for targets like snowflake which can ingest at great speed.

If target processing is a bit slow, and causing latency buildup for large incoming batched for a specific table then you could isolate that table in a task and take the 'hit' of reading all changes twice. Using logstream may make that lest (resource) costly.

Replicate has only limited (apply) parallellisme built in (and some parallel Oracle ASM log reading). Using two (or more taks can give some nice extra parallelism at all steps (source processing, sorter, network to target, apply on target). This may well give lower latency overall, but the total resources needed will be larger.

Hein.

pasquij · ‎2023-12-05

Thank you for the feedback, @john_wang , @SachinB , @Heinvandenheuvel

We are using SQL Server (Azure VM SQL Server) for both source and target. We only have 1 source database and 1 target database.

Currently, we are already using log stream. We are using two log stream staging tasks and two replication tasks to transfer the staged data to the target.

It seems like, based on your feedback, we should actually just use 1 log stream staging task. Having 2 does not really help in reducing latency. Is this accurate?

Then, we can retain the two replication tasks to transfer the staged data to the target - one task for the table with a large number of updates and another task for the rest of the tables.

john_wang · ‎2023-12-05

Hello @pasquij ,

It seems like, based on your feedback, we should actually just use 1 log stream staging task. Having 2 does not really help in reducing latency. Is this accurate?

You are correct!

Thanks,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

pasquij · ‎2023-12-06

Thank you @john_wang

Best Practices