Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
Today I was doing a test, and I loaded this csv file (https://www.freemaptools.com/download/full-postcodes/ukpostcodes.zip) and converted to QVD with Qlik Sense.
The surprise was that the QVD file was bigger than the CSV file.
Zip file (with csv): 32,169 KB
CSV file: 94,192 KB
QVD file: 121,277 KB
Is around 30% bigger!
This file has four fields: id, latitude, longitude and postal code. I think, the "problem" is in the latitude and longitude fields, that they are super acurate (15 decimal digits), because the id is an incremental integer (is a little space in QV) and the postal code is a nine char string (with a lot of similar blocks)
To increase the performance I have some suggestions:
Regards
Hello Jose Miguel, it's pretty normal than the qvd uses more space, it's optimized for speed on loads, not to save space, maybe it's because it also has a header and the symbol tables:
Performance tips are more for the qvw than the qvd.
I'm not surprised the QVD is larger in this case. A CSV file of locations would likely have no repeating field values. It's likely every value is unique. If that's the case, the smallest storage format would be pure text like a CSV.
A QVD on the other hand contains lists of values (Symbol tables), pointers and metadata. When there are repeating field values, the "de-duplication" process will result in a QVD that takes less disk space than the source data.
More information here:
Document Compression | Qlikview Cookbook
-Rob
Thanks Rob,
But finally this space will be at the QVW file (and in RAM) and this is a small part of data (only the coordinates for a map representation)
The accuracy of coordinates was my initial idea for decrease the memory (maybe I don´t need 15 decimal digits for the representation in a map).
Finally I think I´ll do a join to discard all the unused postal codes.
Thanks again
Thanks Ruben,
But finally this space will be at the QVW file (and in RAM) and this is a small part of data (only the coordinates for a map representation)
The accuracy of coordinates was my initial idea for decrease the memory (maybe I don´t need 15 decimal digits for the representation in a map).
Finally I think I´ll do a join to discard all the unused postal codes.
Thanks again
Rob is right that with mostly distinct values you won't save storage space or RAM but with a few transformations you could reduce the size significantely, for example with:
test:
LOAD floor(id) as id,
subfield(postcode, ' ', 1) as postcode1,
subfield(postcode, ' ', 2) as postcode2,
floor(subfield(latitude, '.', 1)) as latitude1,
floor(left(subfield(latitude, '.', 2), 14)) as latitude2,
// this kind of cutting will cause a small inaccuracy but I think it's not essential
floor(subfield(longitude, '.', 1)) as longitude1,
floor(left(subfield(longitude, '.', 2), 14)) as longitude2
FROM [ukpostcodes.csv] (txt, codepage is 1252, embedded labels, delimiter is ',');
store test into test.qvd (qvd);
which will reduce it to 58,003 and if you removed the id (any use ? - the association with the datamodel should be the postcode, right? - whereby by loading the data into a datamodel the complete postcode would be needed but it could be then an autonumber() field) it would be just reduced to 45,955.
And if you load them there with an exists() to the real existing postcodes you should need even much lesser RAM as this number.
But if you are not on the limits of your environment it might be not worth to make these additionally efforts ...
- Marcus