Re: LOAD with 2/3 less fields results in a bigger ... - Qlik Community

datanibbler · ‎2018-05-29

Hi,

I have tried to slim down an app by removing all the fields which are not further processed - to that end I removed that "LOAD * " and used the complete fieldlist instead, then commenting out all the fields which I do not want.

<=> After running that, I get a qvd file about 1/3 bigger than the original ...

Why is that?

Thanks a lot!

Best regards,

DataNibbler

marcus_sommer · ‎2018-05-29

Are you sure that the number of records are the same? What is the data-source? From qvd - then optimized or not? Any other changes within the script - order within the script, anything else?

- Marcus

olivierrobin · ‎2018-05-29

hello

are you sure you didn't remove a field used ti join your tables ? (if yes , you will join one row with multiple others, making the resulting table bigger than the original script)

datanibbler · ‎2018-05-30

Hi you two,

thanks for the quick reply! I'm pretty sure I haven't changed anything else in the script but for replacing the * with the complete fieldlist and then commenting out unneeded fields - but I'll have another look, and I will check whether any of those fields are maybe used further down the script ... if I interpret your answers correctly, if everything is as it should be, then the file should not be bigger, but rather smaller since I have removed fields, right? I will check.

Thanks a lot!

Best regards,

DataNibbler

marcus_sommer · ‎2018-05-30

Yes, lesser fields and/or records within a load should lead to a smaller table/file-size than before. Related to the special way Qlik stores the data into data- and symbol-tables it might not really noticable because it isn't nearly linear like it would be by a text-file.

But there are exceptions from this rule for example if qvd's are loaded unoptimized the result might have a bigger file-size as the source. The same is true if any of these fields are already loaded before in the script and a numerical field in the loaded source might now be handled as dual- or as text-value.

- Marcus

datanibbler · ‎2018-05-30

Hmm ... strange.

I have experimented a bit more with this: This transformation uses two large SAP tables and I've removed about 4/5 of all fields from both of them and made sure that all the necessary fields remain, both for that script (further down) and for the following script that uses the qvd coming out of this one.

Also, there is nothing additional running in the code.

Still, the resulting file is quite a bit bigger than it was with all the fields in - it's about 84 MB, used to be about 67. Well, then I guess I'll just leave this be ...

marcus_sommer · ‎2018-05-30

I think a excluding-strategy would be useful. This meant to isolate these loadings into separate qvw's to exclude any side-effects from anywhere. Personally I could imagine that some formattings did change like hinted above - for example by a timestamp-field the needed storage-space could grow rapidly and easily counteract the lesser number of fields.

- Marcus

datanibbler · ‎2018-06-04

Thanks Marcus!

Well, this is on my ToDo-list.

There are no new fields I introduced to get it to run, I just removed fields right at the beginning of the script, making sure that all the fields that are required further down in the script remain. But isolating the LOAD seems a good idea.

I haven't got time to look at it in detail right now, but I'll create a note for myself and I'll have a look at it whenever I can.

Best regards,

DataNibbler

Peter_Cammaert · ‎2018-06-04

All reasons for smaller or larger qvds can be found in the QVD header. Can you post the headers from the two QVD files (the smaller one with more fields, and the larger one with fewer fields)? It's not that difficult to analyse QVD's...

Peter_Cammaert · ‎2018-06-04

And (I read the entire dicussion allright, but I still feel the need to ask 😉 you're not doing a merge-two-fields-into-one trick, are you? SAP usually stores time and date values in different fields. If you merge those into a single QlikView timestamp field, you'll get both a larger symbol table and a larger index for that field.

Also, recent SAP connectors may dump all fields as string values by default. If you convert those into dual values (by using the Date or Date# functions), you'll turn the symbol table entries from pure string values into duals that still contain the string part. Result: larger symbol tables (but the index stays about the same size).

LOAD with 2/3 less fields results in a bigger file!?