Solved: How to estimate size of transaction files - Page 2 - Qlik Community

HeleneExner · ‎2023-03-24

Dear community,

A major migration is planned in the source database. When performed on a test database, this migration produced 2.5 TB of redologs.
Can the transaction file size for Qlik Replicate be calculated from the size of the redo logs? Do the size of the redo logs match the potential size of the transaction files?

Many thanks in advance,

Helene

Heinvandenheuvel · ‎2023-03-29

Some more thoughts.

>> Now I have the possibility to increase disk space. I just need to know how big.

It's tempting to ask for 4TB and give some back when done. Test and measure actual usage before Prod system upgrade.

>> - Qlik Server is on Linux (no access for me) Disk Space 500MB

Three problems -

minor - 500MB is surely a typo : 500 GB.

500 Gb is not much, just get more as you indicate?

More random ramblings to possibly help you:

No access for you - and/or not having an admin to work with is a serious organizational issue which you need to have addressed. How do you know which space was exhausted? Sorter area for a specific task? Space for task logs? How do you know by how much? And again how fast / slow did it will up? no answers! Maybe there was excessive (virtual) memory use and the swap space filled up? That will make a system unresponsive! You need Linux admin access or partnership to determine that.

How much CPU was available on the Replicate server? If it was limited then maybe the lighter source reader threads managed to keep reading and filling up the pipe without the (CPU) power to empty out the pipe on the target side? Where some (the wrong) replicate tasks/thread 'niced' away to the background. You need Linux admin access or partnership to determine that.

Back to the basic understandings: Replicate does NOT read and store the REDO log, it processes and forwards. It needs to hold all changes in a transaction until a transaction is committed. You can see that in the CDC monitor UI - "Accumulating (0 transactions) (until source commit)" - What did you see while processing? Queue being build? a few very large transactions? Once committed they are routed to go to the target. You can see that in the CDC monitor "Applying (0 transactions)
(until target commit)" what did you see there?

Everything depends on you application, but it is not uncommon to see 1K .. 2 KB per chance. Migrations typically do bulk transactions - a whole table or set of tables. With for example a modest 1 million rows that becomes 1 GB which have to be held - no choice. With 100 million rows (in a single transaction) that's 100GB times two - once to Kafka once to Oracle. Now you start pushing your server limits already. You should be able to relate this to you Oracle UNDO space needed. Was it complaining? Did it need to be grown? Did you get the infamous "ORA-01555 Snapshot Too Old" during tests?

Now let's say reading the first transactions fits. Now Replicate start to push to the target. How fast can the target accept? Could it accept at all? If it is accepting slow then obviously the pipeline will fill up. So when you tested, did you see 'target latency'? no answers.

Maybe you can de-tune the Oracle source reader, like restricting parallel ASM threads. That way the pipe might not fill up as quickly.

Converting to LOGstream is probably NOT a good idea because it would add staging space. However, it could allow you to throttle the input by stopping and resuming the logstream task.

>> I have very limited time for this process.

noop, you have all the time you need. you just have to explain to management what can be done and what cannot be done in the current configuration and guide them to better configuration if needed. If you cannot do that, then tell them you need help perhaps engaging Qlik professional services. Time is not YOUR problem it is an application design / deploy problem which you are trying to help with.

>> before the migration all tasks were stopped. After the migration, when the tasks were resumed,

Why stop the tasks? If you allow them to keep running everything is 'warmed up' and Replicate can start pushing transactions and cleaning up after them as soon as they are committed. You may be able to overlap the generation of changes with the application on target, more or less keeping up versus being faced with everything at once. And perhaps the migration is done in (big) steps and just maybe you can work with the migration team to hold up on some steps some until prior steps are processed?

>> About Resource control for Qlik: yes, warning by 70% and stop the tasks at 80%.

Well, did that trigger?

You really need to gather more information - from everywhere! Replicate can tell you about several critical measure, the Linux admin can tell you more, the source folks can tell you how many rows and transactions to expect, the target folks can tell you whether the 'apply' gets all the resources and speeds it needs or is a 'top SQL' bogging down the system slowing down the applies.

>> This is the scenario that will soon take place in a productive environment.

You probably need consulting help. Good luck!

Hein.

kishetty03 · ‎2023-04-11

HI@HeleneExner

I have got an exact same scenario at my end ditto.
what action were performed to resolve this if you can suggest that would be very helpful

Thanks,

Kiran

HeleneExner · ‎2023-04-12

Hi Kiran,

now it's all behind me The new release and subsequent migration in the source database had many problems. This migration took 2 days. All tasks were stopped during this time.
It was very smart of me to order extra disk space. Disk Space has been doubled - 1TB.
A Taks has caused a big problem. 221 tables are to be loaded with the task. During the release, 25 tables (partly very large) should be reloaded, the rest continue the load. That did not work. The disk space was full (1TB) and Qlik tasks were stopped automatically. The disk space was expanded by another 600 GB, which was fully written even after 2 hours. So continuous application was no longer possible.
We had only one possibility to run this task as a full load for all tables. After the full load (about 20 hours) this task runs in continuous application mode.
In my experience, a mix of reload and resume doesn't work well. For such a scenario I would prefer full load. If there are still open questions, I'll be happy to answer them.

Best Regards,

Helene

kishetty03 · ‎2023-04-13

Thanks a lot @HeleneExner for the information

How to estimate size of transaction files

Best Practices

Performance