Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a blank QVW canvas loaded with only one table's data:
Here are the columns:
Here are the memory statistics:
Class | Type | SubType | Id | Bytes | Count | Size | CalcTime | AvgCalcTime |
Variable | Variable | Internal | CD | |||||
Variable | Variable | Internal | QvPath | |||||
Variable | Variable | Internal | QvRoot | |||||
Variable | Variable | Internal | QvWorkPath | |||||
Variable | Variable | Internal | QvWorkRoot | |||||
Variable | Variable | Internal | WinPath | |||||
Variable | Variable | Internal | WinRoot | |||||
Variable | Variable | Internal | ErrorMode | |||||
Variable | Variable | Internal | StripComments | |||||
Variable | Variable | Internal | OpenUrlTimeout | |||||
Variable | Variable | Internal | ScriptErrorCount | |||||
Variable | Variable | Internal | ScriptErrorList | |||||
Variable | Variable | Internal | ScriptError | |||||
Database | Table | Records | Data | 179206446 | 29867741 | 6 | ||
Database | Table | Records | $$SysTable 1 | 3 | 3 | 1 | ||
Database | Table | Records | $$SysTable 2 | 1 | 1 | 1 | ||
Database | Table | Records | $$SysTable 3 | 3 | 3 | 1 | ||
Database | Field | Symbols | $Field | 34 | 3 | 11.33 | ||
Database | Field | Symbols | $Table | 10 | 1 | 10 | ||
Database | Field | Symbols | $Rows | 18 | 1 | 18 | ||
Database | Field | Symbols | $Fields | 11 | 1 | 11 | ||
Database | Field | Symbols | $FieldNo | 33 | 3 | 11 | ||
Database | Field | Symbols | $Info | 6 | 1 | 6 | ||
Database | Field | Symbols | amount | 254824 | 31853 | 8 | ||
Database | Field | Symbols | owner | 426355655 | 20214495 | 21.09 | ||
Database | Field | Symbols | State | 72 | 9 | 8 | ||
State Space | Table State | Internal | Data | 29867741 | ||||
State Space | Table State | Internal | $$SysTable 1 | 3 | ||||
State Space | Table State | Internal | $$SysTable 2 | 1 | ||||
State Space | Table State | Internal | $$SysTable 3 | 3 | ||||
State Space | Field State | State | $Field | 3 | 1 | 3 | ||
State Space | Field State | Selection | $Field | 0 | ||||
State Space | Field State | State | $Table | 1 | 1 | 1 | ||
State Space | Field State | Selection | $Table | 0 | ||||
State Space | Field State | State | $Rows | 1 | 1 | 1 | ||
State Space | Field State | Selection | $Rows | 0 | ||||
State Space | Field State | State | $Fields | 1 | 1 | 1 | ||
State Space | Field State | Selection | $Fields | 0 | ||||
State Space | Field State | State | $FieldNo | 3 | 1 | 3 | ||
State Space | Field State | Selection | $FieldNo | 0 | ||||
State Space | Field State | State | $Info | 1 | 1 | 1 | ||
State Space | Field State | Selection | $Info | 0 | ||||
State Space | Field State | State | amount | 31853 | 1 | 31853 | ||
State Space | Field State | Selection | amount | 0 | ||||
State Space | Field State | State | owner | 20214495 | 1 | 20214495 | ||
State Space | Field State | Selection | owner | 0 | ||||
State Space | Field State | State | State | 9 | 1 | 9 | ||
State Space | Field State | Selection | State | 0 |
My machine has 16GB of RAM.
The file is ~ 200,000 KB
The data is loaded from a simple load-script which takes a reasonable amount of time
So I have the data and a blank Qlikview canvas.... lovely:
I add a Chart:
Dimension : owner
Expression : SUM(Amount)
Dimension Limit : Top 10, or maybe Top 20... Very small subset
When I press "OK" on the CHart it may render, or it may (crashes due to Memory Allocation Exceeded, or "failed to draw chart")
If it renders, it takes a lengthy delay, between 2 and 8 minutes.
Once it renders, re-sorting descending vs ascending will cause another lengthy delay. Or it crashes.
I assume any other Qlikview activities would also be unbearably slow (filtering/making a selection, exporting to Excel, etc)
While it tries to render, my Task Manager shows Memory varying between 11.5 GB and the 16GB max (If Qlikview is closed, it hovers ~ 8GB)
And it sees CPU Usage varying between 50% and 100% (if Qlikview is closed it hovers ~ 40%)
I can accept that "well, it's just too much data for your simple/weak personal computer!" if that is the answer, I thought 16GB of RAM was quite powerful, but maybe I'm wrong...
But I'm frustrated because me I've had a bigger table, like:
40M rows
10 columns
Which results in a bigger QVW file, like:
900,000 KB (almost 1GB)
And many more Sheet Objects on the canvas, including a chart, a search object.
... and it seems the bigger file was both faster and more stable than the smaller file ...
Any advice appreciated...
That´s because your owner field has a very high cardinality 2.021.449 distinct values.
An straight table with "top 10" must calculate 2.021.449 times SUM(Amount), sort them and calculate top 10.
Qlikview/Sense is very impacted by distinct values
Related reading:
Symbol Tables and Bit-Stuffed Pointers
http://qlikdevgroup.com/wp-content/uploads/Hamburg-Event01-Session01-QIX-Engine-Under-the-hood.pdf
Nathaniel,
Since you said data model is simpler, you can check whether you are loading all the fields from the source or the only required fields in the source. Or you can find the unused fields from the document analyzer and remove it, you can find it here Document Analyzer V2 -- a major update
second thing is, doc analyzer not just give you the list of unused fields info..but it will also gives you which object consumer more memory, which expression takes more calc time and much more. you can find the details and instructions in the application itself.
if you data model keys are concatenated fields, then you can try using autonumber function while creating the key, it will reduce more memory footprint.
if there is field with more distinct values and you find it not necessary, you can sort out a way to remove or modify..
example say if you have transaction date with timestamp...and you actually no need of the timestamp in transaction date by trimming it..this will greatly reduces the memory and calc time.
try aggregate the data in the script itself. Move the complex expression from the chart to script.
calculated dimensions can be avoided or create a calculated column in the script and bring it inside the chart as single field
Siva
That´s because your owner field has a very high cardinality 2.021.449 distinct values.
An straight table with "top 10" must calculate 2.021.449 times SUM(Amount), sort them and calculate top 10.
Qlikview/Sense is very impacted by distinct values
Related reading:
Symbol Tables and Bit-Stuffed Pointers
http://qlikdevgroup.com/wp-content/uploads/Hamburg-Event01-Session01-QIX-Engine-Under-the-hood.pdf
Thanks, Clever! I'm glad you could point me at the main problem, rather than suggest general optimizations.
I also suspected "high cardinality" of owner field is the main (only?) problem.
I wanted to prove this by creating a fake dataset who has a string column with 29M total values, 20M distinct values --
I should expect similar slow performance from even the fake dataset right?
In fact, to generalize:, anyone with comparable hardware should experience comparable slow performance with 20M cardinality on any string field they have, regardless of dataset, right?
----
(On a side-note, let's assume each string value is , on average, 20 bytes long. I think that's how I can interpret the "Size" column in my memory statistics, right? "Bytes" divided by "Count"?:)
Database | Field | Symbols | owner | 426355655 | 20214495 | 21.09 |
This assumption is important because an average length of 30 bytes or 100 bytes probably behaves differently -- it's more expensive to calculate uniqueness if the strings are longer. Well, I guess that depends on the string algorithm...
"I should expect similar slow performance from even the fake dataset right?"
Yes, I´m almost sure about it. I´ve made myself a fake database similar to yours and my laptop i7 with 8Gb hung while calculating that top 10 straight table.
"In fact, to generalize:, anyone with comparable hardware should experience comparable slow performance with 20M cardinality on any string field they have, regardless of dataset, right"
Yes, similar hardware should perform similarly.
"This assumption is important because an average length of 30 bytes or 100 bytes probably behaves differently "
I agree with you.
Changed: every string (using 30 or 100 bytes) is stored only once into Symbol Table generating a pointer. That behaviour should be different but not so much.
Alternatively Create a table with aggregated values/ Call it a summarized table for all your possible evaluations, that will take off the load from your dashboard
In this case, pre aggregating would not speed so much. We would reduce 20Mi to 2Mi records, and the qliikview would have to sort them to create the top 10.
We would face another concern. a LOAD with a Group By is really CPU expensive (usually is mono-core) taking a loooong time to calculate
Yes; well the reduction in the number of rows would ideally depend on the the dimension values,period range and the level of granularity that the is desired...
I prefer putting the load on the data load than making the business users wait for the chart objects to load
We were discussing the particular case that we have 2 million different values to be aggregated
Oh that's right, the symbol table can reduce /eliminate effect of string lengths. Cool.
Specifically, if I have 1000 unique values, it doesn't matter if the average string is length 10 or length 1000. 1000 unique values means the symbol pointers (during "bit-stuffing") will use minimum number of bits to accommodate 1000 unique values.
So, symbol pointers are the same regardless of string length, which means my data table (which contains only pointers) will be same.
However, as Henric said, the symbol table itself will naturally be bigger if strings are bigger:
"The length of the symbols will affect the size of the symbol table."