Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
morenoju
Partner - Specialist
Partner - Specialist

Storing millions of records in QVD files

Hi,

I have a table in my DB with approximately 220 millions of records and rising every day.

In my data script, I decided to store the records in on QVD for each month but reading the data from the QVDs is taking more time than I had expected (around 15 min) and it's consuming a lot of resources in my server.

Would it be better if I had one QVD per year? Is it faster for Qlik to read a single QVD than several of them?

Thanks,

Juan

15 Replies
morenoju
Partner - Specialist
Partner - Specialist
Author

Well, I'm supposed to make all records available, so that's why I read the qvd of all months. The problem I'm seeing is that when I load the 220 million records in the table, the App stops working (I can't open it, I see the animated circles indefinitely).

martinpohl
Partner - Master
Partner - Master

try to debug script with a few records to see that data model is ok (no synth keys or loops)

petter
Partner - Champion III
Partner - Champion III

When you have typically more than a million records it is always important to look at how you can help Qlik to get the most out of it's built-in compression mechanisms.

  1. The #1 method for doing that is to look at the cardinality of each field. If you can turn low-cardinality field into a high-cardinality field you can save a lot of space (memory) and it affects also calculation time accordingly.
  2. Don't store redunant data - fields that contains just variants of each other. You have several of them.

"link_timestamp",

"link_date",

"link_time",

"link_time_15min",

"link_minute",

"link_hour",

[link.id],

speed,

"speed_mph",

TravelTimeIndex,

traveltime

    • link_timestamp might be stored with a high precision. So it would very often be a high-cardinality field. You can often round them off to a much lower resolution and save a lot of memory. Sometimes as much as 6-20 times the reduction in storage.
    • Maybe you don't need the link_timestamp at all - why do you keep it - doesn't the other fields contain the same information if you combine them?
    • link_time and link_time_15min is redundant. If you need to keep link_time at least round it off to the lowest acceptable resolution like seconds or minutes. You might save a lot there too.
    • Be sure that link_date contains only the date part so truncate it to an integer by doing av Floor(link_date)
    • speed and speed_mph are redundant. Why not just use a conversion factor when you need mph? it's a constant of 1.69034 and is easy to add to any expression.
    • be sure to round off the speed to the lowest resolution possible to save memory. Do you really need it to be with decimals - if not make it into an integer - you can round it off too.

Splitting a high-cardinality field like a timestamp into it's constituent parts like date, hour, minute, second and tenths of seconds will help tremendously - but don't keep the original timestamp. This will enable much much higher compression under the hood for Qlik. You might save a lot of memory.

morenoju
Partner - Specialist
Partner - Specialist
Author

Thanks Petter. I've been editing the script following your advice and I've seen some improvement. I'm keeping the timestamp and I think the cardinality is not that high since in my DB table all values are in 15min samples (9:00, 9:15, 9:30, 9:45, 10:00, 10:15...). But I've got rid of all the other "link" fields and of speed. Speed_mph I've rounded it as you suggested.

Still, it takes a long time to load, but I'm now thinking of the resources of the machine where I have Qlik Sense installed. I'm using two cores and the CPU usage is of 100% during the load. I'm going to increase to 4 and see what happens.

Same with the memory, but this time not when loading but while editing or using the app. The 8 GB I have are used most of the time so I'm going to increase the memory too.

Thanks for the help, and I'll let you all know how it goes.

Regards,

Juan

morenoju
Partner - Specialist
Partner - Specialist
Author

This is working very well now. Thanks petter-s for the recommendations to improve the load script. And all participants for the good ideas provided.

pbriggs11
Contributor II
Contributor II

Thanks for this solution, I have never heard anyone speak of evaluating the cardinality of fields (or maybe I was just asleep in class that day). I implemented this approach on a QVD with over 100 million rows that was taking just over an hour to load, even though it stated it was an optimized QVD load.  Now loading in under 5 minutes.