Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
Jagsfan82
Partner - Contributor III
Partner - Contributor III

Generic Load (or similar) performance

FYI: Example attached as an excel book to better illustrate my question

Hello experts.  Thank you in advance!

I just finished publishing an application to our server, and performance is not ideal (it was unusable before I removed my list boxes with calculated dimensions).  It's not a massive data set... about 60 million rows across 50 fields..  roughly 3gb of data.  It doesn't feel like memory is an issue, but rather processing power.

Almost every expression I use has at least one bit of set analysis in it, and often 2 or 3.  There are only two "measures" I care about in the raw data (Premium and Commission), but the end user rarely cares about all premium and all commission.  This means almost every expression needs set analysis.

The way that I understand how Qlik works, I didn't expect this to be a big issue.. or even an issue at all.. but is it possible that is slowing things down compared to an alternative?  Would it make sense to add columns so Qlik is only calculating Sum(OneField) compared to Sum({$<Type={'One'}>} Field) or Sum(OneTwoThreeField) compared to Sum({$<Type={'One','Two','Three'}>} Field). 

Essentially I would be taking the One column that tracks premium per transaction and duplicating it for each type (adding 5 fields), removing the need to perform set analysis on that column.  Each row would have 0s across the board except for two columns (see excel sheet).  So a generic load.. but there's no reduction in length.. unless I go ahead with the generic load.  But would that be more strain on the processing? 

Real example of a common expression I use:

1:
Sum({$<[Transaction Code]={04,040,05},[Insurance Type Desc]={'Ceded'}>} [Written Premium])

2:
-Sum( {$<[Transaction Code]={04,040,05},[Insurance Type Desc]={'Direct','Assumed','Coinsurance'}>} [Written Premium]) /
Count({$< [Cancel Flag]={1},[Insurance Type Desc]={'Direct','Assumed','Coinsurance'}>}DISTINCT [Policy Key])

New example of the expression:

1:
Sum({$<[Transaction Code]={04,040,05}>} [Ceded Premium])

2:
-Sum( {$<[Transaction Code]={04,040,05}>} [Gross Premium]) /
Count({$< [Cancel Flag]={1}, [Gross Premium]-={0}>}DISTINCT [Policy Key])

If this is helpful, considering I need to track these on a transaction basis and they don't necessarily aggregate nicely, is this a candidate for a generic load, or is it better to transform the old QVD into a new QVD with the additional rows via manually coding additional fields (e.g. If(Type='One',Premium,0) as OnePremium)... similar to how it is done in the excel sheet?

Labels (1)
5 Replies
jonathandienst
Partner - Champion III
Partner - Champion III

.. unusable before I removed my list boxes with calculated dimensions..

That's odd - usually calculated dimensions are more calculation heavy than s simple list box. Often much more so.

The way that I understand how Qlik works, I didn't expect this to be a big issue.

Your understanding is correct and it should not be a big issue.

Replacing  a single column with multiples will increase the memory footprint of your app and complicate aggregations and could slow the application down. i would look at set expression fragments like [Insurance Type Desc]={'Direct','Assumed','Coinsurance'} and consider adding flag fields to the load script to simplify the set expressions. This will have low memory footprint and should increase performance.But there could be more at play. The memory footprint when loading a 3GB QV file will be anything between 6GB to 20GB. Are you sure that is not an issue.

I don't know why you feel that a generic load will help you, unless you are using the term loosely.  A QV generic load is for a specific use case where the data contains a lot of attribute/value pairs and the generic load transforms that into regular tables.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
Jagsfan82
Partner - Contributor III
Partner - Contributor III
Author

First off, thanks for the reply!  This is my first major project and I think I got a bit ambitious when I heard people talking about billions of rows and I'm only sitting around 60MM.  To you and others the time spent looking a newbies work is appreciated.  I included a bunch more information here so feel free to pick and choose or ignore any of this.  

Sorry for the confusion on the list boxes.. They were List boxes that I used to create a mini horizontal bar chart with labels (Calculated Dimension, "Fill to Value" display of Expression, Text value of Expression).  I'm replacing the calculated dimension with a cyclic dimension. Only a minor trade-off in usability for what I assume is a huge performance jump (Haven't pushed the new version out yet).

I'm think we're running much closer to, if not at the 6GB of ram side of that.  I'm fairly confident it's not an issue because a sheet with a simple pivot table with one dimension (~500 values) by Month and a calculating Sum({$<[Type]={'Direct','Assumed','Ceded'}>} Premium) is calculating pretty well.  In the 1-4 seconds range depending on the selections I'm making.  So that's up to 6000 cells it's populating fairly quickly.

I'm glad my assumptions were correct about widening my table.  The generic load just came into play when I was looking to see if QlikView had an automatic Rows to Columns function.  It did seem like it wasn't really the proper use case. 

As far as your suggestion about set analysis, I wasn't sure about changing text values to numeric values because I thought everything was being compressed and optimized with pointers anyway.  Example, I have a field with just two values that I could easily convert to a 0/1 flag, but currently I'm just using NewRenew={'Newline'} or NewRenew={'Renewal'}.  But my first thought was to at least code the text options as numerics (Direct = 1, Coinsurance=2, Assumed=3, Ceded=4) to change the set expression to TypeCode={1,2,3}.  Would you expect that to yield a small performance gain, or would you really only expect to see a bump if every option is converted to a flag? e.g. DirectFlag, CoinsuranceFlag, AssumedFlag, CededFlag, GrossFlag.  That's an additional 5 flags, but as you mentioned I wouldn't expect that to be a big memory bump.  I'm also using a good amount of set analysis on a numeric field with a decent number of options.  I could pull out common combinations to flag (see below), but I thought Qlik should already be pretty efficient performing set analysis on a numeric field with ~20 options.

  • NewRenew[Newline,Renewal] --> NewFlag, RenewFlag
  • TransactionCode[04,05,040]--> CancelFlag
  • TransactionCode[01,02,010,020] --> InitialFlag
  • TransactionCode[03,030]--> EndorsementFlag

 

There's a two other areas I feel like could be small performance hits adding up:

  • Dynamic titles.  I'm allowing the user to essentially modify both the dimensions and expression in the visuals, and those selections dictate what the titles or captions are displaying for most of my objects.  Some of them have slightly more compelx if logic, but below is a common example:
    • =' Summary by  '&If(GetSelectedCount(%dbMainDimension)=1,%dbMainDimension,'Source System')&' | Sorted by '& Pick($(db_iChartSort),'Gross Premium','Policy Count','Net Premium','Ceded Premium')
  • Custom Sorting.  Example of the sort expression for the above tables
    • Pick(db_iChartSort,$(mmSumGrsPrem),$(mPolicyCount),$(mmSumNetPrem),$(mmSumCedPrem)
  • Dynamic Expressions.  For many of my objects I'm allowing the user to select a measure from a list box, and the object will update accordingly.  Essentially I have the variables stored in a table, and once a user makes a selection the object uses the only value available in the 'Expansion' column to calculate the expression.
  • Number of Objects. I have quite a few objects on each sheet, 40-100 depending.  Maybe even a bit more on one sheet. There's just a whole lot of Show/Hide functionality.   

Would you expect the first two to be adding maybe noticeable lag when there's 2-4 of those dynamic titles/sorts working on every selection? 

Would you expect a performance boost if I changed the expressions to updating dynamically on selection to updating on the push of a button instead?  Instead of the object pulling from the 'Expansion' Column, it would pull a variable that is updated when the user presses the "Update" button.

I attached a scrambled version if anyone wants to look at what else they see.  Note that it still has the dynamic list boxes instead of the cyclic list boxes.  I'm planning on changing ALL the dynamic dimension options to cycle groups already.  I defaulted the view to what I think is my most important sheet, but the "Dashboard" and "Visuals" tabs were giving me the most problems.

Again, all time and help is appreciated.

jonathandienst
Partner - Champion III
Partner - Champion III

>>Would you expect that to yield a small performance gain...

Probably not much performance difference, but its easier to enter and maintain the expressions.

Dynamic title do not normally take much calculation as it is just one item, not millions. Unless of course you are aggregating data from a larger data set.

On that topic, 60m rows is a large, but not very large data set -- depending on how wide the tables are --  so some care needs to be taken over the design. Move logic that is not dependent on user interaction into the load script and remove unused fields. Its quite a large data set to cut your teeth on.

Dynamic expressions make cache hits less likely than fixed expressions, so they mat be slower as the results will need to be calculated rather than fetched from the cache -- but that is very app dependent, and may or may not be an issue in your case.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
Jagsfan82
Partner - Contributor III
Partner - Contributor III
Author

Ya that all makes sense.  I was kindof assuming all these little calculations that are occuring that aren't referecing my main data set shouldn't be a problem.  But now that it's published and it feels slower than it should be I'm questioning everything.  I probably should just find out exactly what kind of resources I'm working with from a hardware standpoint before blaming performance on simple things. 

The main takeaway seems to be not to use calculated dimensions on large data sets. 

Brett_Bleess
Former Employee
Former Employee

Nicholas, here is a Design Blog post on Calculated Dimensions, might be of some use to you:

https://community.qlik.com/t5/Qlik-Design-Blog/Calculated-Dimensions/ba-p/1472813

Regards,
Brett

To help users find verified answers, please do not forget to use the "Accept as Solution" button on any post(s) that helped you resolve your problem or question.
I now work a compressed schedule, Tuesday, Wednesday and Thursday, so those will be the days I will reply to any follow-up posts.