Solved: why qvd size is almost the same no matter if you o... - Qlik Community

hectorgarcia · ‎2013-11-11

hello I'm trying to optimize my data model replacing the content of a long string field for a numeric value field, but when I store and compare the output the size of the qvd is almost the same , any body now why???

This is my table.

This my script:

data:

LOAD floor(num(EVENTID)) as EVENTID,

mapingid3,

autonumber(mapingid3) as mapignum

FROM

\\qlvpublisher01\G\SourceDocuments\DataSources\QVDs\BCO\Bolp\HIST\2013_11_09_eventosbco.qvd

(qvd);

store EVENTID,mapingid3 from data into conmapping.qvd(qvd);

store EVENTID,mapingid3 from data into conmapping.txt(txt);

store EVENTID,mapignum from data into conmappingnum.qvd(qvd);

store EVENTID,mapignum from data into conmappingnum.txt(txt);

this is the resulting qvd's looks: if i'm replacing the content of a big string for a integer, i'm expecting a significant reduction of the qvd size, but this is not happening why??? any body can help me to understand??

this is the way the qvd files

stevedark · ‎2013-11-11

Hi Hector,

The most critical thing is reducing the number of unique values in any field. The first question has to be whether the EventId is required for linking or searching - drop it if not. With other fields you may find you can get to fewer unique values by reducing granularity - eg. rounding numerics to fewer dp's. Sometimes you can achieve the same by splitting fields in two, for example you can split a datetime field into two (date and time separate) each with fewer distinct values and then put them back together in the front end. The same is true with other fields, perhaps splitting a post code into two fields split by the space.

Dropping any un-used columns is always a good idea, you can use Rob Wunderlich 's Document Analyser to identify fields that are not used.

If performance over that volume of data is a problem then you may want to look into Direct Discovery, and leave some values in the database until a user has selected a more limited set of rows.

Two things which become really important when you get to those kind of volumes is ensuring optimised QVD loads and keeping to as few tables in the data model as possible - maybe even only having one table with all data in.

Hope this has given you some pointers.

Steve

View solution in original post

stevedark · ‎2013-11-11

Hi Hector,

That does indeed seem a bit strange. I wouldn't expect as big a difference between the QVD sizes as the text files, but I would expect some difference. The thing that will add bulk to a QVD is lots of unique values in a field, and by definition there will be as many unique values in the mapingnum field as there are in the mappingid3 field. It may be that QlikView is allocating the same size buffer for the integer as it is for the text field - I guess that the Id is going up to quite a high value so a large amount of space would be required for the high numbers.

You could use EasyQlik QVewier by Dmitry Gudkov to see if MetaData is swelling the size of your QVD - but with a simple load from QVD I doubt that is the case.

95MB is not actually that big for a QVD file - so unless you are hitting performance problems somewhere it's probably best not too worry about it.

Hope that helps.

Steve

hectorgarcia · ‎2013-11-11

Thansk steve, actually i'm doing a test only for one of the fields of my table , my daily qvd is normally 21 Million records , and what i'm trying to do is optimize this huge amount of data in a simple way

any recommendation??

swuehl · ‎2013-11-11

How many distinct values of mapignum / mapingid3 do you have in that sample?

If you have only few distinct values, that would explain the small difference in qvd size.

stevedark · ‎2013-11-11

Hi Hector,

The most critical thing is reducing the number of unique values in any field. The first question has to be whether the EventId is required for linking or searching - drop it if not. With other fields you may find you can get to fewer unique values by reducing granularity - eg. rounding numerics to fewer dp's. Sometimes you can achieve the same by splitting fields in two, for example you can split a datetime field into two (date and time separate) each with fewer distinct values and then put them back together in the front end. The same is true with other fields, perhaps splitting a post code into two fields split by the space.

Dropping any un-used columns is always a good idea, you can use Rob Wunderlich 's Document Analyser to identify fields that are not used.

If performance over that volume of data is a problem then you may want to look into Direct Discovery, and leave some values in the database until a user has selected a more limited set of rows.

Two things which become really important when you get to those kind of volumes is ensuring optimised QVD loads and keeping to as few tables in the data model as possible - maybe even only having one table with all data in.

Hope this has given you some pointers.

Steve

rbecher · ‎2013-11-11

There is a lot of improvement possibility for QVD files: TIQView Blog - Honey, I shrunk the QVDs - A Blog about QlikView Data Integration (however QVD data format could be more sufficient) but you realy need to deep dive into data types and cardinalities. At least there is some help, almost always..

- Ralf

Astrato.io Head of R&D

stevedark · ‎2013-11-11

You may also want to look at my blog posts on optimising data models and QVD optimised loads:

http://www.quickintelligence.co.uk/perfect-your-qlikview-data-model/

and

http://www.quickintelligence.co.uk/qlikview-optimised-qvd-loads/

Regards,

Steve

hectorgarcia · ‎2013-11-11

thanks stephen your tips are being very helpful , i will use them as a best day to day practice

why qvd size is almost the same no matter if you optimize the content of the fields??