Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

pick rating on the fly using calculated score

hi guys, looking for some directions here.

I have large QVD which represents financial transactions - 90 million rows and growing. based on some KPIs and complex calculations, I determine overall score for a specific entity. The score is a number from 0 to 100 and it CANNOT be calculated in a load script - depending on user selections, the scores would be different.

with that said, here is my challenge. I am a given a little table that determines a rate for an entity, based on its score - basically that table has low score and high score and a rate from 1 to 5. I need to calculate the score (again on the fly based on user selections) and using that table identify its rate.

Here is a few options I considered:

1) create a data island table for scores/rates and use if thens to pick a rate from there. While it works great on by 10,000 records sample, it is taking forever in my real app.

2) use aggr, set analysis and advanced search expression in it - did not work because aggr is calculated not per row but per chart.

Any other ideas?

20 Replies
Not applicable
Author

Borys,

i prepared small example in assumption, that you have to get your rating regarding to some dimension value ....

For example purposes i assumed, that i have some data and my score function is simple avg.

I added field Dim_1 to simulate some user selections.

In my opinion this aproach should not impact a lot your applicationperformance

Let me know if it helped you

If you like this direction, i have some additional idea, how to optimize it

Regards

Darek

Anonymous
Not applicable
Author

dziękuję, Darek! you loved my little challenge, did not you?

I think this is a great solution - if i got it right, you build 7 different expressions in the load script, then use pick function to pick one of the 7 for the corresponding dimension.

I tried and it is very fast - much better than mine firstsortedvalue / aggr solution. I think it is definitely a great way to address a problem - but unfortunately not going to work for me.

My requirements have just changed and there will be 112 different dimension values to rate - not 7 as i thought initially. And every one of these 112 will be using different from/to scores and  ratings 1..100, so we are looking at 112 expressions in that pick function. Every one of those 112 expressions will have approximately 7000 characters in length (all if/then-s for every rate 1 to 100), so the total length of my pick expression would be around 784,000 characters. I've read somewhere that QV will start crashing on you once you approach expression length in 20,000-50,000.

My solution with firstsortedvalue / aggr actually worked but it was super slow - like 60 seconds to recalculate my chart.

But after much thinking I looked at it from another perspective - I changed my source data for that rate table.

Basically the idea is to load scores and rates in a horizontal table like that (like a cross table):

targetdim, score_from_rate1, score_from_rate2, ... score_from_rate100

this table will have 112 rows.

Then in a chart, use expression like this one:

IF ( $(mymetric) < ONLY ( score_from_rate1), 'rate 1'

,IF ( $(mymetric) < ONLY ( score_from_rate2), 'rate 2'

...

,IF ( $(mymetric) < ONLY ( score_from_rate100), 'rate 99'

,IF ( $(mymetric) < ONLY ( score_from_over_rate100), 'rate 100'

)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) //close all 101 if-s

this works pretty fast - takes 4-5 seconds to recalculate my chart.

I am still not 100% happy as it was such a hassle and still looks ugly but it works.

I am going to accept your answer as correct one though - thanks for trying solve this!

Not applicable
Author

Borys,

proszę

i think you may find something interesting in v1

So, aggregate formula for each dimension value as a row in formula table

Then using dimension number, with pick you may ask $(=fieldvalue('field',nbr))

Remember, that it will work nice while field has distinct values. So, it will be better to check it during reload. If 2 dimension values have the same score ranges and rating, you will have to have the same formula and dim number).

Your solution with 100 columns and 112 rows (associated with dimension) looks nice. But i think, that there will be better if you preaggregate data stronger. This may be also more complex script if you would like to prepare it generic (assuming, that number of your ratings /columns in your solution/ will grow up). There is of course generic load, but ....

We may also go into table with 112 columns: formula_1.formula_2, .... formula_112 and 1 row. Then pick(nbr, $(=only(formula_1)), ...,$(=only(formula_112))). It is also "generic" and may be easy prepared in the script.

poka

Darek

Not applicable
Author

Borys,

1 more idea with your 100+1 columns and 112 rows table.

Columns are:

Dim, r1,r2,r3, ... , r100

In rn column you have n'th range from value.

Check how fast is expression:

-RangeSum(score>=r1,score>=r2,score>=r3,.......,score>=r100)

Anonymous
Not applicable
Author

got it, very cool

Anonymous
Not applicable
Author

sorry, Darek, not sure I understand this. RangeSum will give a sum of all boolean expressions and then what can i do with that?

Not applicable
Author

Look,

in QV true=-1, false=0

simple example:

DIM,range,score_from,score_to

A,1,    0,     0.25

A,2,     0.26,     2.55

A,3,     2.56.     7.80

A,4,     7.81

our table:

DIM, r1, r2, r3, r4

A,     0, 0.26 , 2.56, 7.81

lets test:

score = 3

-rangesum(3>0, 3>0.26, 3>2.56, 3>7.81) = - rangesum(-1,-1,-1,0)=3

another test:

score=2.3

range=-rangesum(2.3>0,2.3>0.26,2.3>2.56, 2.3>7.81)=-rangesum(-1,-1,0,0)=2

Anonymous
Not applicable
Author

this is a very interesting idea - I will think about it more tonight. Thanks for your help again!

Not applicable
Author

Your range should be equal to number of score_from, which are less than calculated score.

Anyway, i'am not sure if this will be good for performence. It is always 11200 comparisions.

I think, that it may be better to prepare something like tree.

Let's asume range table (for one Dim value):

range, score_from, score_to

1,      0     ,     5.05

2,     5.06     ,     10.01

3,     10.02     ,     25.04

4,     25.05     ,     30.20

5,     30.21     ,     50,54

.....

10,     101,33     ,     110,43

11,     110,44     ,     137,01

.....

20,     189.01     ,     205,45

......

100,     567,99     ,

We may prepare 2 levels if (like tree)

if(score<=110.43,     if(score>=0,1,if(score>=5.06,2,if(score>=10.02,3,if(score>=25.05,4,.......))))))))),if(score<=189.01,if(score>=110,44,11,........)))))))))),..........)))))))))))))))))))))))))))

So, first level looks in which tenth is result. 2'nd level checks only values in tenth found in 1'st level.

This way, if you have 100 ranges and 112 dim values, max number of comparisions will be 10x112 (to find tenth for each dim value)+10x112(to find range for each dim value) = 2240, and no summation needed.

With rangesum we have always 100x112 logical comparisions and next 112 summations.

regards

Darek

Anonymous
Not applicable
Author

unfortunately performance is left to be desired so I will be open to any other suggestions.

It is one of these things which are very easy to do in Excel using vlookup function but QV cannot handle this well. I have hard time to explain this to my project stakeholder who happens to be an Excel guru