Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi team,
I'm having a doubt regarding task latency. Imagine I'm replicating tables from source (oracle-redologs) and I have a schema with (100 tables). Earlier I was running a single task for these 100 tables, Now I have multiple tasks with these tables divided. I wanted to know whether keeping all tables in a single task or dividing among different tasks, which of these will affect the latency. I think when I divided the tasks the source latency has increased.
can anyone educate me more on this topic!!
Regards,
Sreehari
Hi @Sreehari ,
Thank you for reaching out to the Qlik Community.
Let's say you divide the tables into two tasks. This means two Replication processes will connect to your Oracle database to read the same redo log files, that is, same redo files will be read twice. Since the two tasks are running in parallel, the overall performance would likely be better.
However, having more tasks will also increase the load on your servers. This may have an adverse effect. In addition, you'll need to carefully consider whether the target server has sufficient capacity and resources to handle these changes.
Regards,
Desmond
Hello @Sreehari
Task Latency is (Source commit till the same data committed on the Target Side for participating tables.) In general, let say there is one table where you don't have PK which cause DML slow for that table hence it will pull down the throughput for the rest of table.
If you wish to divide the table, then consider type of table .and kind of columns and data they hold.
1. put table in a task which HAVE PK (it always has good throughput)
2. put tables which have lobs in it.
having multiple tasks connected to same source sometimes put load on source DB. Some time they provide required throughput.
for some case Longstreet is suggested. it all depends upon the Customer Environment
Regards,
Sushil Kumar
>> I think when I divided the tasks the source latency has increased.
Yup, that feels normal/expected to me.
You failed to indicate the source or target db types used. That may well make a difference for the right advice
What problem are you trying to fix? If latency is too high, then would it not be better to try to figure out what the root cause/tables for that is/are, and deal with that - vs blindly splitting up 50/50?
In doing this analysis you should determine which tables to combine or not and may well find that an 80/20 split is better based perhaps on change volume or other considerations such as 'high priority' tables.
When you split the same tables from 1 task to 2 then you are causing the source changes logs (Redo) to be read twice, transferred twice, interpreted twice. That's double the read cost right there, and zero gain potential.
The tasks are also likely to read and accumulated less changes to be applies in the same interval making the bulk apply less efficient. There will be two smaller changes files (CSV) transferred, or smaller change tables created. That's double the bulk prepare/execution overhead and limited gain potential (gain could be more rows/tables.)
With 2 tasks there can be increased sorter parallelism. That's limited Replicate server CPU overhead which indeed could benefit from two processes.
On the apply side, the target apply engine may already be able to run in parallel (SnowFlake, SQL server, Oracle,... ) and putting that in 2 tasks just makes it less controllable, more peaky.
If you do decide that 2 tasks is better, for example after determining that as @SushilKumar suggest there perhaps are slower tables without PK, or with LOB lookups, or with source|target_lookup transformation) then it is probably best to go for 3 tasks in a LOGSTREAM configuration.
That is have 1 task (LogStream parent) read the change logs and look for changes to tables of interest only and 2 child tasks to process those pre-read, pre-selected changes.
Hein.