Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
jmvilaplanap
Specialist
Specialist

QVD bigger than CSV

Hi all,

Today I was doing a test, and I loaded this csv file (https://www.freemaptools.com/download/full-postcodes/ukpostcodes.zip) and converted to QVD with Qlik Sense.

The surprise was that the QVD file was bigger than the CSV file.

Zip file (with csv):      32,169 KB

CSV file:                    94,192 KB

QVD file:                  121,277 KB

Is around 30% bigger!

This file has four fields: id, latitude, longitude and postal code. I think, the "problem" is in the latitude and longitude fields, that they are super acurate (15 decimal digits), because the id is an incremental integer (is a little space in QV) and the postal code is a nine char string (with a lot of similar blocks)

To increase the performance I have some suggestions:

  • What's the number of decimal digits in the coordinates needed for a postal code?
  • Can I divide the postal code in two four char strings? It has sense?

Regards

5 Replies
rubenmarin

Hello Jose Miguel, it's pretty normal than the qvd uses more space, it's optimized for speed on loads, not to save space, maybe it's because it also has a header and the symbol tables:

https://help.qlik.com/en-US/sense/February2018/Subsystems/Hub/Content/Scripting/work-with-QVD-files....

Performance tips are more for the qvw than the qvd.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I'm not surprised the QVD is larger in this case.  A CSV file of locations would likely have no repeating field values. It's likely every value is unique. If that's the case, the smallest storage format would be pure text like a CSV.

A QVD on the other hand contains lists of values (Symbol tables), pointers and metadata.  When there are repeating field values, the "de-duplication" process will result in a QVD that takes less disk space than the source data.

More information here:

Document Compression | Qlikview Cookbook

-Rob

http://masterssummit.com

http://qlikviewcookbook.com

jmvilaplanap
Specialist
Specialist
Author

Thanks Rob,

But finally this space will be at the QVW file (and in RAM) and this is a small part of data (only the coordinates for a map representation)

The accuracy of coordinates was my initial idea for decrease the memory (maybe I don´t need 15 decimal digits for the representation in a map).

Finally I think I´ll do a join to discard all the unused postal codes.

Thanks again

jmvilaplanap
Specialist
Specialist
Author

Thanks Ruben,

But finally this space will be at the QVW file (and in RAM) and this is a small part of data (only the coordinates for a map representation)

The accuracy of coordinates was my initial idea for decrease the memory (maybe I don´t need 15 decimal digits for the representation in a map).

Finally I think I´ll do a join to discard all the unused postal codes.

Thanks again

marcus_sommer

Rob is right that with mostly distinct values you won't save storage space or RAM but with a few transformations you could reduce the size significantely, for example with:

test:

LOAD floor(id) as id,

     subfield(postcode, ' ', 1) as postcode1,

     subfield(postcode, ' ', 2) as postcode2,

     floor(subfield(latitude, '.', 1)) as latitude1,

     floor(left(subfield(latitude, '.', 2), 14)) as latitude2,

     // this kind of cutting will cause a small inaccuracy but I think it's not essential

     floor(subfield(longitude, '.', 1)) as longitude1,

     floor(left(subfield(longitude, '.', 2), 14)) as longitude2

FROM [ukpostcodes.csv] (txt, codepage is 1252, embedded labels, delimiter is ',');

store test into test.qvd (qvd);

which will reduce it to 58,003 and if you removed the id (any use ? - the association with the datamodel should be the postcode, right? - whereby by loading the data into a datamodel the complete postcode would be needed but it could be then an autonumber() field) it would be just reduced to 45,955.

And if you load them there with an exists() to the real existing postcodes you should need even much lesser RAM as this number.

But if you are not on the limits of your environment it might be not worth to make these additionally efforts ...

- Marcus