Re: Create QVD with billions of records in DB - Qlik Community

Yatinder · ‎2020-05-15

Hi Guyz,

Could you please help me out with an idea to load billions of records into QVD from table which have more than 8 billion records. As I used the date condition to get data since 2015, the record count got reduced to 4.5 billion.

Do anyone have idea how to load the huge dataset in QVD'S. What all are the methods by which we can compress the data as it has 37 columns among which 6 columns are related to date, month and year.

Further, I want to know if this dataset needs to be partitioned to multiple QVD's say with 1 billion of records in each QVD at stage level and then perform incremental load condition.

jobsonkjoseph · ‎2020-05-15

Hi,

while creating or generating QVDs, you should always consider the data/rows/columns that you want to bring in.

you can limit the size by bringing in rows from a set period as you've mentioned, which will reduce the row count.

and Qlik doesn't have any limitation on the QVD size, as long as there are enough resources on the server.

you need to go by one thumb rule, there should be atleast 10 times memory to the app size.

Yatinder · ‎2020-05-15

Thanks. So there is no need create separate QVD's to store 5 billion records and it can be stored in one QVD if there is enough memory available on qlik server. Is my understanding correct?

jobsonkjoseph · ‎2020-05-18

Yes you are correct.

Also, if still you want to consider making the QVD size small, what you can do is split the app.

like, for current and previous year data create an app and for rest of the years another app and use document chaining concept. If that makes sense to your report.

marcus_sommer · ‎2020-05-18

Personally I would tend to split the data into multiple qvd's on a yearly or maybe even a monthly level. Splitting the data will lead to some kind of additionally overhead by creating and also by reading the qvd's again but if we talk about a few dozens of qvd's it won't be rather significant in regard to the load-times (by thousands of files it should be measured).

The benefits of splitting the data could be to load directly different datasets into various applications without the need to filter the data appropriate. Probably even more important would be the possibility to implement any incremental logic.

Beside this there is a limitation to 2 billion unique field-values and if your data contain a timestamp or some kind of record-id from the database you may hit this limitation. Whereby both mentioned types of data shouldn't be included within Qlik. A record-id isn't very useful - only in cases to validate data but not in any reports and timestamps should be better splitted into dates, times (hh:mm:ss) and milli-seconds (if available). Further there is no need to have further period-fields like month and year because they could be easily deduced from the date within the target-application by using a master-calendar. Also other fields might be optimized in this way or by removing formats and so on.

- Marcus