topic Re: QVD bigger that CSV in QlikView

QVD bigger than CSV

jmvilaplanap — Wed, 25 Nov 2020 16:16:04 GMT

Hi all,

Today I was doing a test, and I loaded this csv file (https://www.freemaptools.com/download/full-postcodes/ukpostcodes.zip) and converted to QVD with Qlik Sense.

The surprise was that the QVD file was bigger than the CSV file.

Zip file (with csv): 32,169 KB

CSV file: 94,192 KB

QVD file: 121,277 KB

Is around 30% bigger!

This file has four fields: id, latitude, longitude and postal code. I think, the "problem" is in the latitude and longitude fields, that they are super acurate (15 decimal digits), because the id is an incremental integer (is a little space in QV) and the postal code is a nine char string (with a lot of similar blocks)

To increase the performance I have some suggestions:

What's the number of decimal digits in the coordinates needed for a postal code?
Can I divide the postal code in two four char strings? It has sense?

Regards

Re: QVD bigger that CSV

rubenmarin — Fri, 06 Apr 2018 11:22:21 GMT

Hello Jose Miguel, it's pretty normal than the qvd uses more space, it's optimized for speed on loads, not to save space, maybe it's because it also has a header and the symbol tables:

https://help.qlik.com/en-US/sense/February2018/Subsystems/Hub/Content/Scripting/work-with-QVD-files.htm

Performance tips are more for the qvw than the qvd.

Re: QVD bigger that CSV

rwunderlich — Fri, 06 Apr 2018 11:24:39 GMT

I'm not surprised the QVD is larger in this case. A CSV file of locations would likely have no repeating field values. It's likely every value is unique. If that's the case, the smallest storage format would be pure text like a CSV.

A QVD on the other hand contains lists of values (Symbol tables), pointers and metadata. When there are repeating field values, the "de-duplication" process will result in a QVD that takes less disk space than the source data.

More information here:

Document Compression | Qlikview Cookbook

-Rob

http://masterssummit.com

http://qlikviewcookbook.com

Re: QVD bigger that CSV

jmvilaplanap — Fri, 06 Apr 2018 11:34:25 GMT

Thanks Rob,

But finally this space will be at the QVW file (and in RAM) and this is a small part of data (only the coordinates for a map representation)

The accuracy of coordinates was my initial idea for decrease the memory (maybe I don´t need 15 decimal digits for the representation in a map).

Finally I think I´ll do a join to discard all the unused postal codes.

Thanks again

Re: QVD bigger that CSV

jmvilaplanap — Fri, 06 Apr 2018 11:34:45 GMT

Thanks Ruben,

But finally this space will be at the QVW file (and in RAM) and this is a small part of data (only the coordinates for a map representation)

The accuracy of coordinates was my initial idea for decrease the memory (maybe I don´t need 15 decimal digits for the representation in a map).

Finally I think I´ll do a join to discard all the unused postal codes.

Thanks again

Re: QVD bigger that CSV

marcus_sommer — Fri, 06 Apr 2018 12:07:37 GMT

Rob is right that with mostly distinct values you won't save storage space or RAM but with a few transformations you could reduce the size significantely, for example with:

test:

LOAD floor(id) as id,

subfield(postcode, ' ', 1) as postcode1,

subfield(postcode, ' ', 2) as postcode2,

floor(subfield(latitude, '.', 1)) as latitude1,

floor(left(subfield(latitude, '.', 2), 14)) as latitude2,

// this kind of cutting will cause a small inaccuracy but I think it's not essential

floor(subfield(longitude, '.', 1)) as longitude1,

floor(left(subfield(longitude, '.', 2), 14)) as longitude2

FROM [ukpostcodes.csv] (txt, codepage is 1252, embedded labels, delimiter is ',');

store test into test.qvd (qvd);

which will reduce it to 58,003 and if you removed the id (any use ? - the association with the datamodel should be the postcode, right? - whereby by loading the data into a datamodel the complete postcode would be needed but it could be then an autonumber() field) it would be just reduced to 45,955.

And if you load them there with an exists() to the real existing postcodes you should need even much lesser RAM as this number.

But if you are not on the limits of your environment it might be not worth to make these additionally efforts ...

- Marcus