Solved: where exists for 160 million rows with unique keys - Qlik Community

mstoler · ‎2021-05-25

Hello,

It appears that when I use the where exists(%PRIMARYKEY) on a very large table (160 plus million rows) that a large amount of CPU is used.

Unfortunatley if other reloads are occurring at the same time they all fail.

Is this correct?

Michael

marcus_sommer · ‎2021-05-26

Yes, it's clear it's an intermediated step - created in a certain layer and used in n applications in a following layer.

The question is for what this key is really needed? How does the key look like and why is there an unique key for each record?

Maybe there is an option to change or to avoid this key. Means for example, even if the key is in a rather strict view necessary for a classical incremental load you may discard them and switching to another logic, like using simply a date as key - even if it means not to load each time only the newest records else some more and getting a bit overlapping data which are then evaluated and appropriate removed with the date-logic.

- Marcus

View solution in original post

marcus_sommer · ‎2021-05-25

It's intended that each calculation / load statement within an application takes all resources of CPU + RAM which are available. This may lead to failures to any other parallel running processes from Qlik but also from other tools or OS services especially to timeout-errors if they couldn't get enough resources/handles in a certain time.

Before struggling with any restriction measures and/or distributing the tasks to other time-frames or other machines and/or adding more resources you should investigate the causes of lacking enough resources in more detail. An optimized qvd-load with 160 M of records should be performed rather fast and just causing a short performance peak from maybe 1-3 minutes which shouldn't impact the overall performance too much.

In your case I would rather assume that the load isn't optimized and that there isn't enough RAM available which forced Qlik to swap all or parts of the data (multiple times) to the disc - and if this happens it will slowdown quite heavily ...

- Marcus

mstoler · ‎2021-05-25

Hello,

The load is optimized.

However there are 160 million records with unique keys.

On a Development Server with nothing else running it works fine.

On a Production Server it does not complete and causes other reloads to fail.

For now I will just do a complete load every day.

Michael

marcus_sommer · ‎2021-05-26

This hints in the direction that there is really not enough RAM available. 160 M of unique keys especially if they contain rather complex strings require a lot of RAM.

Without knowing any details from your application I think it would be probably more as 10 GB maybe even 50 GB only for this single field. With the other fields and the already loaded data the needed RAM might be much bigger. Is there is not enough RAM for this task available it doesn't make sense to run it in this way.

Before thinking about the hardware I suggest to rethink the whole datamodel - is really such key mandatory? The key might be divided into several fields, converted into a number if it's a string, completely avoided if it's just a record-id from the database or if it's from a link-table if the fact-tables aren't linked else concatenated or ...

- Marcus

mstoler · ‎2021-05-26

Hello,

This is not for a data model.

I am simply trying to create an incremental qvd file as below.

https://help.qlik.com/en-US/qlikview/April2020/Subsystems/Client/Content/QV_QlikView/QVD_Incremental...

Michael

marcus_sommer · ‎2021-05-26

Yes, it's clear it's an intermediated step - created in a certain layer and used in n applications in a following layer.

The question is for what this key is really needed? How does the key look like and why is there an unique key for each record?

Maybe there is an option to change or to avoid this key. Means for example, even if the key is in a rather strict view necessary for a classical incremental load you may discard them and switching to another logic, like using simply a date as key - even if it means not to load each time only the newest records else some more and getting a bit overlapping data which are then evaluated and appropriate removed with the date-logic.

- Marcus