Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
FYI: Example attached as an excel book to better illustrate my question
Hello experts. Thank you in advance!
I just finished publishing an application to our server, and performance is not ideal (it was unusable before I removed my list boxes with calculated dimensions). It's not a massive data set... about 60 million rows across 50 fields.. roughly 3gb of data. It doesn't feel like memory is an issue, but rather processing power.
Almost every expression I use has at least one bit of set analysis in it, and often 2 or 3. There are only two "measures" I care about in the raw data (Premium and Commission), but the end user rarely cares about all premium and all commission. This means almost every expression needs set analysis.
The way that I understand how Qlik works, I didn't expect this to be a big issue.. or even an issue at all.. but is it possible that is slowing things down compared to an alternative? Would it make sense to add columns so Qlik is only calculating Sum(OneField) compared to Sum({$<Type={'One'}>} Field) or Sum(OneTwoThreeField) compared to Sum({$<Type={'One','Two','Three'}>} Field).
Essentially I would be taking the One column that tracks premium per transaction and duplicating it for each type (adding 5 fields), removing the need to perform set analysis on that column. Each row would have 0s across the board except for two columns (see excel sheet). So a generic load.. but there's no reduction in length.. unless I go ahead with the generic load. But would that be more strain on the processing?
Real example of a common expression I use:
1:
Sum({$<[Transaction Code]={04,040,05},[Insurance Type Desc]={'Ceded'}>} [Written Premium])
2:
-Sum( {$<[Transaction Code]={04,040,05},[Insurance Type Desc]={'Direct','Assumed','Coinsurance'}>} [Written Premium]) /
Count({$< [Cancel Flag]={1},[Insurance Type Desc]={'Direct','Assumed','Coinsurance'}>}DISTINCT [Policy Key])
New example of the expression:
1:
Sum({$<[Transaction Code]={04,040,05}>} [Ceded Premium])
2:
-Sum( {$<[Transaction Code]={04,040,05}>} [Gross Premium]) /
Count({$< [Cancel Flag]={1}, [Gross Premium]-={0}>}DISTINCT [Policy Key])
If this is helpful, considering I need to track these on a transaction basis and they don't necessarily aggregate nicely, is this a candidate for a generic load, or is it better to transform the old QVD into a new QVD with the additional rows via manually coding additional fields (e.g. If(Type='One',Premium,0) as OnePremium)... similar to how it is done in the excel sheet?
.. unusable before I removed my list boxes with calculated dimensions..
That's odd - usually calculated dimensions are more calculation heavy than s simple list box. Often much more so.
The way that I understand how Qlik works, I didn't expect this to be a big issue.
Your understanding is correct and it should not be a big issue.
Replacing a single column with multiples will increase the memory footprint of your app and complicate aggregations and could slow the application down. i would look at set expression fragments like [Insurance Type Desc]={'Direct','Assumed','Coinsurance'} and consider adding flag fields to the load script to simplify the set expressions. This will have low memory footprint and should increase performance.But there could be more at play. The memory footprint when loading a 3GB QV file will be anything between 6GB to 20GB. Are you sure that is not an issue.
I don't know why you feel that a generic load will help you, unless you are using the term loosely. A QV generic load is for a specific use case where the data contains a lot of attribute/value pairs and the generic load transforms that into regular tables.
First off, thanks for the reply! This is my first major project and I think I got a bit ambitious when I heard people talking about billions of rows and I'm only sitting around 60MM. To you and others the time spent looking a newbies work is appreciated. I included a bunch more information here so feel free to pick and choose or ignore any of this.
Sorry for the confusion on the list boxes.. They were List boxes that I used to create a mini horizontal bar chart with labels (Calculated Dimension, "Fill to Value" display of Expression, Text value of Expression). I'm replacing the calculated dimension with a cyclic dimension. Only a minor trade-off in usability for what I assume is a huge performance jump (Haven't pushed the new version out yet).
I'm think we're running much closer to, if not at the 6GB of ram side of that. I'm fairly confident it's not an issue because a sheet with a simple pivot table with one dimension (~500 values) by Month and a calculating Sum({$<[Type]={'Direct','Assumed','Ceded'}>} Premium) is calculating pretty well. In the 1-4 seconds range depending on the selections I'm making. So that's up to 6000 cells it's populating fairly quickly.
I'm glad my assumptions were correct about widening my table. The generic load just came into play when I was looking to see if QlikView had an automatic Rows to Columns function. It did seem like it wasn't really the proper use case.
As far as your suggestion about set analysis, I wasn't sure about changing text values to numeric values because I thought everything was being compressed and optimized with pointers anyway. Example, I have a field with just two values that I could easily convert to a 0/1 flag, but currently I'm just using NewRenew={'Newline'} or NewRenew={'Renewal'}. But my first thought was to at least code the text options as numerics (Direct = 1, Coinsurance=2, Assumed=3, Ceded=4) to change the set expression to TypeCode={1,2,3}. Would you expect that to yield a small performance gain, or would you really only expect to see a bump if every option is converted to a flag? e.g. DirectFlag, CoinsuranceFlag, AssumedFlag, CededFlag, GrossFlag. That's an additional 5 flags, but as you mentioned I wouldn't expect that to be a big memory bump. I'm also using a good amount of set analysis on a numeric field with a decent number of options. I could pull out common combinations to flag (see below), but I thought Qlik should already be pretty efficient performing set analysis on a numeric field with ~20 options.
There's a two other areas I feel like could be small performance hits adding up:
Would you expect the first two to be adding maybe noticeable lag when there's 2-4 of those dynamic titles/sorts working on every selection?
Would you expect a performance boost if I changed the expressions to updating dynamically on selection to updating on the push of a button instead? Instead of the object pulling from the 'Expansion' Column, it would pull a variable that is updated when the user presses the "Update" button.
I attached a scrambled version if anyone wants to look at what else they see. Note that it still has the dynamic list boxes instead of the cyclic list boxes. I'm planning on changing ALL the dynamic dimension options to cycle groups already. I defaulted the view to what I think is my most important sheet, but the "Dashboard" and "Visuals" tabs were giving me the most problems.
Again, all time and help is appreciated.
>>Would you expect that to yield a small performance gain...
Probably not much performance difference, but its easier to enter and maintain the expressions.
Dynamic title do not normally take much calculation as it is just one item, not millions. Unless of course you are aggregating data from a larger data set.
On that topic, 60m rows is a large, but not very large data set -- depending on how wide the tables are -- so some care needs to be taken over the design. Move logic that is not dependent on user interaction into the load script and remove unused fields. Its quite a large data set to cut your teeth on.
Dynamic expressions make cache hits less likely than fixed expressions, so they mat be slower as the results will need to be calculated rather than fetched from the cache -- but that is very app dependent, and may or may not be an issue in your case.
Ya that all makes sense. I was kindof assuming all these little calculations that are occuring that aren't referecing my main data set shouldn't be a problem. But now that it's published and it feels slower than it should be I'm questioning everything. I probably should just find out exactly what kind of resources I'm working with from a hardware standpoint before blaming performance on simple things.
The main takeaway seems to be not to use calculated dimensions on large data sets.
Nicholas, here is a Design Blog post on Calculated Dimensions, might be of some use to you:
https://community.qlik.com/t5/Qlik-Design-Blog/Calculated-Dimensions/ba-p/1472813
Regards,
Brett