Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Qlikview efficiently handles repeated values within each column, eg if there are a million rows but only 100 distinct values, this will not occupy much space. This functionality does not seem to apply across columns though. Is there any way to have Qlikview recognise the same data values in different columns? Eg:
T1:
Load rand() as A1 autogenerate(100000);
generates a 1MB QVW. Whereas:
T1:
Load rand() as A1 autogenerate(100000);
Load A1 resident T1;
generate a 1.3 MB QVW (there are twice as many rows but the same number of distinct values so its not twice as big).
Now this one:
T1:
Load rand() as A1 autogenerate(100000);
T2:
load A1, A1 as A2 resident T1;
drop table T1;
Generates a 2MB file even though its the same 100,000 distinct values stored in 2 columns.
Hmmm, that DOES seem strange on the surface. I can duplicate your results, and I've also verified they apply to the amount of RAM used, not just the memory required for the stored file. Perhaps the problem is that it STILL has to treat the fields as distinct in some cases. If you selected 10 values from A1, although you have the same data set as if you selected them from A2, it is NOT the same thing entirely. List boxes for the two fields will display differently. The current selections box will display differently. Some functions will return different results for the two fields. It was probably most efficient from a processing standpoint to keep each field with a separate list of values, even if it would have been more efficient from a memory-usage standpoint to have combined the lists in some way.
Its disappointing that it works that way, especially with numbers. We have 20 numeric columns with 25 million rows in our QVW, so surely it would improve performance if qlikview just stored the numbers once with each column having pointers to that master list of values.
I wouldn't conclude that it would improve performance. It might improve memory usage, assuming the memory for the pointer was less than the memory for the number itself. But that's not performance. From a performance standpoint, using an extra level of indirection via a pointer to a value instead of a value would likely only decrease performance.