Chunk QVD based on Size limit

sri94aa · ‎2023-06-27

Hi All,

Is it possible to chunk the QVD size within the STORE command in Qliksense?

Ex: If a particular table tries to create a QVD of size more than 1Gb, is it possible to write to a new QVD when the size exceeds within the STORE command dynamically?

marcus_sommer · ‎2023-06-28

AFAIK - no. You may build an own sub-routine for such task but honestly I think you creates more problems with such an approach as you would solving.

IMO better would be to slice the data in regard to the content and not to the file-size or the number or records. Quite common is a slicing into YYYYMM chunks and/or countries/companies/products or similar stuff - within appropriate incremental approaches in a multi-tier data-architecture.

sri94aa · ‎2023-06-28

Cool, thanks for the reply, we have a restriction to keep the size less than 1Gb, but not sure how it can be done based on the YYYYMM, as I will not be aware of the size how the chunks are made with YYYYMM or countries or companies.

Or · ‎2023-06-28

You could add a row number field to your data and then chunk based on that - file sizes should be relatively consistent for a given number of rows. This will increase the file size a bit (one more field), but might be worth doing if you don't have another field you can chunk on.

sri94aa · ‎2023-06-28

Thanks for the response, do we have function to get the memory of a row?

Or · ‎2023-06-28

Not that I'm aware of, but it should be easy enough to work out by dumping a set number of rows into a file and checking its size.

sri94aa · ‎2023-06-28

ok, but the problem here is it's not for single QVD, but we have around 20-30 QVDs that has different type & number of columns.

Or · ‎2023-06-28

Still shouldn't take too long to check them all with, say, a million rows.

I'm kind of confused, in concept, by a scenario where you might have 20-30 QVDs that exceed 1GB in an environment that doesn't allow files larger than 1GB. This seems like an excessive amount of data relative to the restriction.

marcus_sommer · ‎2023-06-28

It depends mainly on the kind of data how much storage will be consumed. With a lot of distinct field-values the storage-size will behave similar to a sql-table and increasing/decreasing quite linear. But with mainly redundant field-values it's different and ten times of records might only need twice of space.

Beside the challenges to size the qvd's near to 1 GB it could become quite tricky to handle them afterwards. The qvd-filenames might be by the generation just get a continuous number - but how many will exists and which one contains which data?

Like above hinted I think it's not really expedient and a slicing to the content would be more suitable. This must not mandatory be a horizontally chunk related to the records else also a vertically one and sliced by fields.

Only with a content-slicing you will be able to use an incremental logic and dividing the tasks to run in parallel and/or into different time-frames and also to use them within appropriate following data-models and report-layers - without accessing all data and picking the needed ones there.

Further I suggest to review the data within the qvd's and keeping only needed fields without any row-specific formatting and without any record-id's and an optimizing of the field-cardinality, for example by splitting timestamps into dates and times.

sri94aa · ‎2023-06-28

Nope, it is kind of a restriction applied recently by our third party team handling the infra, so we have to implement the changes.

General Question