Qlik replicate full load CPU utilization

Raviraj · ‎2024-12-04

We are running our replicate on windows server with Standard_E8s_v3 size. There will be two tasks always running on the server.

One will always be in CDC mode and other runs full loads 3 times a day and continues in CDC mode remaining time.

The issue we are seeing is when there is a full load running on the second task the CPU is in upwards of 80% and there is a latency on the CDC task.

I heard that when a task is in full load each table will occupy 1 CPU , is it correct ? and each task will need 1 cpu when it is running

john_wang · ‎2024-12-04

Hello @Raviraj ,

Thanks for reaching out to Qlik Community!

I’m not sure about the task definition (e.g., the number of tables, the transformations used, or the source/target endpoints), so it’s hard to determine why the CPU usage is so high. From what I can see, the server has 8 vCPUs and 64 GiB of memory, which should generally be sufficient.

Regarding the question about CPU usage:

I heard that when a task is in full load, each table occupies one CPU. Is that correct?

Not exactly. The Replicate program (repctl.exe on Windows) typically runs on a single CPU and doesn’t distribute one table per CPU.

I’d recommend opening a support ticket and including the Task Diagnostics Packages for further analysis. Our support team will be glad to help. If necessary, we can also involve the Professional Services Team for additional assistance.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2024-12-06

@Raviraj "I heard that when a task is in full load each table will occupy 1 CPU , is it correct ? and each task will need 1 cpu when it is running"

No, that is NOT correct. Check out TRACE (DEBUG) task log for a single task. You'll see lots of worker thread types, which can be scheduled more or less at the same time "SOURCE_UNLOAD", "TARGET_LOAD", "TARGET_APPLY", "SOURCE_READER",. Several of those types can have multiple threads asigned, always identified by the thread ID in the left most column.

The operating system is free to schedule those unique threads on any available CPUs (or rather COREs) or CPU hyper-thread (architecture dependent). Therefor one task could 'eat' all the CPU power available.

You have control though. Specifically under task settings for full load tuning there is the "Maximum number of tables:". This really should read "Maximum number of table handled at the same time". Each table will do a series of unload-load-unload-load. For a well tuned environment that can roughly used 1/2 a CPU, the other half of the time it is waiting for the source or target server and the network transfers. The expected, and desirable, CPU load for a a full-load task is roughly 1/2 of the number of tables assigned to load at the same time.

If a full-load task is taking too much CPU, then reduce the "Maximum number of tables:" under full-load tuning. Default is just 5, so with 'other stuf' needing attention expect 2 to 3 CPU worth of resources for a full load. And with 8 CPU you can therefor expect near 100% CPU load with 3 full-load task running.

For CDC there is only ever 1 SOURCE_READER, but for certain targets (Oracle, SQLserver, SnowFlake) you can select multiple TARGET_APPLY tasks under CDC Tuning.

One way to limit replicated server total CPU overload, and still have lots of parallel activity to source and target is to use OS specific tools to control "Processor Affinity" for example using the Window task manager. You can just tell the OS to only allow a few specific CPUs to a Replicate task. Say CPU 2,3 for task 1 with perhaps 10 tables in parallel, CPU 4 for task 2 with 3 tables parallel, CPU 4,5,6 for task 3 with 20 tables parallel.

Now theoretically Replicate could manage this for you through a task setting of say "CPU number list" but it does not, nobody needed that badly enough to request such feature.

End user could potentially script this though: https://stackoverflow.com/questions/19187241/change-affinity-of-process-with-windows-script

I have found the control through "number of tables" to be adequate.

Oddly perhaps I have used process (and Thread) affinity the other way around to maximize Replicate task CPU and performance by NOT allowing those worker threads to drift over CPU but keep them tied to a CPU and more importantly it's caches. It's not practical, but it is effective and nudge (cheat?!) a benchmark to better numbers.

https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.processthread.processoraffinity?view...

Clear as mud?

Study that Reptask log!

Use a small task, with somewhat short tables in verbose mode for the full effect :-).

When you are good and ready, try to run/understand the attached Perl script to analyze a full-load log or two.

Hein.

General Question

Performance