Solved: Qlik Replicate number of tasks and table limitatio... - Qlik Community

Attunity_user · ‎2022-04-03

Hi Everyone,

Could someone shed some light how many tasks with the max number of tables can be added per server based on the capacity? What is the measurement for this? We are using a log stream and 80 tasks of a total of 4000 tables. We have not observed latency until now but wondering how many more can be added to the server. All tasks except the LS have the transformation to stop the hard delete. That may add additional computation time.

Please share your thoughts.

Thanks

AU

Heinvandenheuvel · ‎2022-04-03

What matters is changes per second across all tasks, that defines your total resource consumption. Of course that could be pegged out in a single tasks where you are despite some multi-threading within a single task one os often down to to number of changes per core, the maximum system performance being max chances/core times number of cores. I don't have recent hard number and would love to hear some from the field. For batch apply I expect thousands of chances per second per core. For transactional high hundreds, low thousand, but that's pretty much a WAG.

4000 tables total is not much at all. A single tasks can often handle that except then the changes are very skewed towards a particularly heavy use table at which point you may want to divvy op.

80 tasks is high, but not crazy high. It will normally need lots of memory (more than 64+ GB) unless very tight tuning is done. I've seen 100+ active tasks in 32 GB but it was tight.

100+ tasks is not officially supported but can work after some Windows tweaks to set a specific initial memory for detached services such as the replicate main service. See for example: https://stackoverflow.com/questions/17472389/how-to-increase-the-maximum-number-of-child-processes-t...

Kindly share more of your context.

memory available/used
smallest and largest memory footprint for your reptask processes
total CPU % for a busy time: cores uses versus cores available
any, or many, reptasks running at more than 1 core?
estimated changes/sec for the worst tasks?
estimated changes/sec over all?
1 logstream tasks feeding 80 slaves? Wow ! Why ? If someone was afraid of more than 50 tables in a task and started spreading that out, over 80 tasks then please reconsider. 5 tasks is probably plenty one or two with just up to 10 busy tables. and 1 to 3 more for 'the rest'

Hein.

View solution in original post

Heinvandenheuvel · ‎2022-04-03

What matters is changes per second across all tasks, that defines your total resource consumption. Of course that could be pegged out in a single tasks where you are despite some multi-threading within a single task one os often down to to number of changes per core, the maximum system performance being max chances/core times number of cores. I don't have recent hard number and would love to hear some from the field. For batch apply I expect thousands of chances per second per core. For transactional high hundreds, low thousand, but that's pretty much a WAG.

4000 tables total is not much at all. A single tasks can often handle that except then the changes are very skewed towards a particularly heavy use table at which point you may want to divvy op.

80 tasks is high, but not crazy high. It will normally need lots of memory (more than 64+ GB) unless very tight tuning is done. I've seen 100+ active tasks in 32 GB but it was tight.

100+ tasks is not officially supported but can work after some Windows tweaks to set a specific initial memory for detached services such as the replicate main service. See for example: https://stackoverflow.com/questions/17472389/how-to-increase-the-maximum-number-of-child-processes-t...

Kindly share more of your context.

memory available/used
smallest and largest memory footprint for your reptask processes
total CPU % for a busy time: cores uses versus cores available
any, or many, reptasks running at more than 1 core?
estimated changes/sec for the worst tasks?
estimated changes/sec over all?
1 logstream tasks feeding 80 slaves? Wow ! Why ? If someone was afraid of more than 50 tables in a task and started spreading that out, over 80 tasks then please reconsider. 5 tasks is probably plenty one or two with just up to 10 busy tables. and 1 to 3 more for 'the rest'

Hein.

Attunity_user · ‎2022-04-04

Hi Hein,

Thank you for your response.

We have added memory a couple of times recently, we are on 100 GB now, 16 core. Estimated applied changes on log stream based on EM about 86M on a peak day. This may vary.

4000 tables in the Log stream, 80 tasks + log stream in total, and 50 tables per child tasks. This is based on the source architecture. If we split tables, by removing from the child (50-25) then it needs to go to another 80 tasks. That's how we pull from 80 different locations. In that case, we need another node or server, correct? We are on Linux server and I think 100 tasks is the limitation.

The memory utilization showing up in the Analytics is 143,814 MB equivalent to 140 GB. I do not know how that is working with the allocated 100 GB in total.

You said "5 tasks is probably plenty one or two with just up to 10 busy tables. and 1 to 3 more for 'the rest'"

Are you recommending 5 tasks in total per server with 10 busy tables of 100 GM memory?

Thanks,

AU

Heinvandenheuvel · ‎2022-04-05

>> about 86M on a peak day.

That's 1000 changes/second all day long, and possibly 5000 changes/second in 4 busy hours and 500 changes/second in the other 20 (those numbers don't add up, but you get my drift. You knw the details). That's very doable with 16 cores, probably using less than halve averaged over the day.

>> 50 tables per child tasks. This is based on the source architecture.

That's not enough information for us to understand why you went to 50 /task, and not say 500. Now if you have 80 distinct TARGET DATABASES (not just schema's but truly disting databases) then you have no choice but to have 80 tasks.

>> The memory utilization showing up in the Analytics is 143,814 MB equivalent to 140 GB. I do not know how that is working with the allocated 100 GB in total.

What was the CPU load? Anyway You can overcommit memory to a good degree. Linux more so than Windows. Tasks often over-allocate pages/buffers (due to tuning or lack thereof) giving them the right to use lots of memory, but as long as they do not all actively use them all at once it will be fine. Still, by tuning the task settings you may well be able to make the picture look nicer and take less risks. The risk being catastrophic slowdown vs gradual measurable diminished response time.

A simple tuning to check is that 500MB default batch optimize buffer. Maybe that's never filled for most tasks? Easy to verify by running a believed to be low use tasks in TARGET_APPLY logging in TRACE mode in a busy hour and look for (count) "finish" versus "because memory usage has exceeded the limit". Adjust some (busy) tasks up others (most of them?) down based on that. That could be 60 (out of 😎 time 400 MB (reducing to 100) equals 24 GB saved. ... well: not even being set aside versus not being touched.

>> Are you recommending 5 tasks in total per server with 10 busy tables of 100 GM memory?

Not at all. Sorry if I wasn't clear but I didn't have much to work with. Read again?

Bonus thought:

I like logstream solutions. They move expensive source db side log reading where the resources may be limited to the replicate server which potentially can be configured as big as needed more easily and without impacting the source application versus re-configuring a DB server and impacting the application users. Depending on the source db type (which you failed to mention) and its log reader tuning it may also reduce the network traffic from db to replicate by a factor of the number of reader tasks (here 80!).

But 80 children may well be over the top. I strongly recommend revisiting those 'architectural' reasons to split into groups of 50 for hard needs vs desirables and simple vs 'we could use fewer but it would be more design work to filter and/or transform'. Lets' say those 86M changes on a busy day correspond to 40GB of CDC log being read. Those are now going to be be stored by the logstream task and re-read, filtered and processed 80 times over for 3200 GB processed.

Now take 2 logstream tasks each configured to take about half of the changes. Yes the source load will perhaps be twice larger. But now there are two chunks of 20GB each read by 2 times 40 child tasks which is 1600 GB processed. Half the work for the Replicate server! Halve less 'crud' (non-selected-changes) to skip over for each task. Furthermore, in a pinch you could possibly reconfigure to 2 independent replicate servers - perhaps each even being a backup to the other should things happen. This may or might not be a better balance.

Attunity_user · ‎2022-04-05

The Qlik replicate tool is fairly new for us. We went live with 10 tables and gradually brought more tables to each task. We have another set of 40 tables in the plan to bring in.

We have about 80 locations. If we bring table A, then the same Table A exists in all locations but the data on each table is different. We grouped it by location in the target database after the replicaton. We only have one transformation on a few tables on each task stopping hard delete. I hope you got the picture of the source setup.

Due to the critical database avoiding issues on the source database, we have one log stream that pulls all data and then moves all data on child tasks.

>> The memory utilization showing up in the Analytics is 143,814 MB equivalent to 140 GB. I do not know how that is working with the allocated 100 GB in total.

Even I do not know how the Enterprise manager is showing higher numbers, 147,735 MB, or how that calculation works with the allocated memory of 100GB. I am planning to create a ticket with the support team to understand this.

>> Depending on the source DB type (which you failed to mention)

Source datatype is Oracle and the target is Oracle too.

Can we double the memory (200 GB) to make it more robust so we can bring the remaining tables or is it better scalable to build another server and create another 80 tasks to split tables?

CPU usage average is only 5%.

Thanks,

AU

Steve_Nguyen · ‎2022-04-06

@Attunity_user from question about task and memory, this is only by testing.

or better is to get Professional Service, that would evaluate your environment and then advise accordingly.

Help users find answers! Don't forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2022-04-06

@Attunity_user , yes as per @Steve_Nguyen it may be time to engage Professional Service, more so than support as the product is working as intended but not sufficiently understood. Training and guidance is needed.

>> We went live with 10 tables and gradually brought more tables to each task

WHY WHY WHY WHY WHY?

Why not just add more tables to a single task?

Each task will start out with for example a 500MB or 1GB process, whether it has 1 table or 1000 tables.

Adding a task per table group will cost you that 500MB or 1GB. Adding 10 tables to a tasks costs you 10 MB (order of magnitude, not a hard number).

>>> I am planning to create a ticket with the support team to understand this.

I do hope you mean your internal system support team, not Qlik. Your question here is reall operating-systems 101 and your (Linux) system management team should be able to explain how memory allocation works. Admittedly they would not know which knobs to turn within Replicate but you don't seem to be interested in that considering zero feedback was provided on suggestions in this space.

>> We have about 80 locations.

Locations means diddlysquat to the readers here. Does that correspond to individual databases? Schemas? Tables?

>> Due to the critical database avoiding issues on the source database,

That's management gobbledygook. It's the effect, not the cause. Provide details/justification.

>> I hope you got the picture of the source setup.

Not, not at all. From your suggestion of using 1 logstream source tasks is sound like one db, with many tables which the application conceptually seems to relate to distinct location, but that's just an attribute. Not a hard boundary is it?

Good luck!

Hein.

Michael_Litz · ‎2022-04-06

I wanted to add this link to an article in the Qlik community which describes a one to many type of task architecture, which may be what you are looking to implement:

Many to One task 2/25 Article and Video
https://community.qlik.com/t5/Knowledge/Qlik-Replicate-Many-to-One-Replication-Configuration/ta-p/17...

Michael Litz

Attunity_user · ‎2022-04-06

Hello,

@Attunity_user , yes as per @Steve_Nguyen it may be time to engage Professional Service, more so than support as the product is working as intended but not sufficiently understood. Training and guidance is needed.

We did engage PS. They reviewed the design and concurred with this approach to match the source database.

Agreed, not fully understood the product or we will not run into issues or look for information by coming here or connecting with the support team. Thanks for all your input.

>> We went live with 10 tables and gradually brought more tables to each task

>>WHY WHY WHY WHY WHY?

Gradual = Different PROJECTS

>>Why not just add more tables to a single task?

>>Each task will start out with for example a 500MB or 1GB process, whether it has 1 table or 1000 tables.

>>Adding a task per table group will cost you that 500MB or 1GB. Adding 10 tables to a tasks costs you 10 MB (order of magnitude, not a hard number).

-------------------------------------------------------------------------------------------------------

We created separate child tasks based on each unique user to handle outages better. But based on what you explained, we may be able to combine less critical table groups.

>>> I am planning to create a ticket with the support team to understand this.

I do hope you mean your internal system support team, not Qlik. Your question here is reall operating-systems 101 and your (Linux) system management team should be able to explain how memory allocation works. Admittedly they would not know which knobs to turn within Replicate but you don't seem to be interested in that considering zero feedback was provided on suggestions in this space.

-------------------------------------------------------------------------------------------------------

The suggestion here is helpful and interested to learn from experienced individuals and product experts. I wanted to understand how the memory used is showing in EM a higher number than the allocated. So we raised a ticket with the Qlik support and got a response

“Memory limits set in the task settings and stream buffers will influence how much memory Replicate can use for the task”

>> We have about 80 locations.

Locations means diddlysquat to the readers here. Does that correspond to individual databases? Schemas? Tables?

Schemas/Users

>> Due to the critical database avoiding issues on the source database,

That's management gobbledygook. It's the effect, not the cause. Provide details/justification.

Availability/Outages. They found CPU getting consumed and did not approve several tasks or LS running at the same time. But we maybe able to manage two or three but will test it.

>> I hope you got the picture of the source setup.

Not, not at all. From your suggestion of using 1 logstream source tasks is sound like one db, with many tables which the application conceptually seems to relate to distinct location, but that's just an attribute. Not a hard boundary is it?

Ok.

One database, 80 unique schemas/users with different sizes.

We will try to split the LS into two (test the impact on source for the CPU usage) and also less critical users combined in the child task.

Good luck!

Hein.

Thanks, Hein

Heinvandenheuvel · ‎2022-04-06

>> We created separate child tasks based on each unique user to handle outages better.

AH. That can indeed be useful as each of the 'companies' being served may have their own special need.

I suspect you pushed this too far through. Replicate skills may be more important than company specific knowledge. You may want to have one 'uber' user that understands Replicate better and from those skills find guidance for company specific support.

[edit] Initially you may feel the need to be able to stop/resume/reload an individual company without impacting others. in practice you'll find Replicate to be set-and-forget once in prodution. Sure, when adding a fresh company give it its own task to make sure it all works as desired. After a a few days or weeks, pick up the tables from that task and bunch them up with several other companies in a larger task, grouped alphabetically or geography wise. You many end up with an east, central, west task each doing 30 companies. If one companies data needs reloading, try the UI to select the tables for that company and reload. If worst comes to worst, stop a super-task, remove the tables for a company, resume. Next fix whatever needed to be fixed for the trouble company (potentially temporarily hosting it with a 'foster family' helper task). Once a new stable point is defined stop the super-task, add the company tables and resume (which will reload). Yes it would be 'nice' to be able to just schedule a company worth of data, but the production price is likely too high for what little bonus flexibility it offers.

>> They found CPU getting consumed and did not approve several tasks

That's indeed reasonable. But it is a balance between complexity, manageability, resources needed. Slightly more complex tasks combining multiple companies may be all that is needed. As some point a second or third Logstream task may be desirable despite the source DB overhead. One element in that, which we did not discuss yet, is that the startup of a single logstream task with 4000+ tables becomes too slow due to it checking the metadata for each tables - like full-load but without rows being fetched. The other element being that each slave has too many tables to pick from.

Sounds like you are considering several new angles now. Excellent. Be sure to use hard DATA, derived from the current experiences to guide you, like the TARGET_APPLY logging analysis, you CPU time observations, the current memory settings vs test with alternatives.

Good luck

Hein

Qlik Replicate number of tasks and table limitation per server

General Question