Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hi guys, looking for some directions here.
I have large QVD which represents financial transactions - 90 million rows and growing. based on some KPIs and complex calculations, I determine overall score for a specific entity. The score is a number from 0 to 100 and it CANNOT be calculated in a load script - depending on user selections, the scores would be different.
with that said, here is my challenge. I am a given a little table that determines a rate for an entity, based on its score - basically that table has low score and high score and a rate from 1 to 5. I need to calculate the score (again on the fly based on user selections) and using that table identify its rate.
Here is a few options I considered:
1) create a data island table for scores/rates and use if thens to pick a rate from there. While it works great on by 10,000 records sample, it is taking forever in my real app.
2) use aggr, set analysis and advanced search expression in it - did not work because aggr is calculated not per row but per chart.
Any other ideas?
Borys,
i prepared small example in assumption, that you have to get your rating regarding to some dimension value ....
For example purposes i assumed, that i have some data and my score function is simple avg.
I added field Dim_1 to simulate some user selections.
In my opinion this aproach should not impact a lot your applicationperformance
Let me know if it helped you
If you like this direction, i have some additional idea, how to optimize it
Regards
Darek
dziękuję, Darek! you loved my little challenge, did not you?
I think this is a great solution - if i got it right, you build 7 different expressions in the load script, then use pick function to pick one of the 7 for the corresponding dimension.
I tried and it is very fast - much better than mine firstsortedvalue / aggr solution. I think it is definitely a great way to address a problem - but unfortunately not going to work for me.
My requirements have just changed and there will be 112 different dimension values to rate - not 7 as i thought initially. And every one of these 112 will be using different from/to scores and ratings 1..100, so we are looking at 112 expressions in that pick function. Every one of those 112 expressions will have approximately 7000 characters in length (all if/then-s for every rate 1 to 100), so the total length of my pick expression would be around 784,000 characters. I've read somewhere that QV will start crashing on you once you approach expression length in 20,000-50,000.
My solution with firstsortedvalue / aggr actually worked but it was super slow - like 60 seconds to recalculate my chart.
But after much thinking I looked at it from another perspective - I changed my source data for that rate table.
Basically the idea is to load scores and rates in a horizontal table like that (like a cross table):
targetdim, score_from_rate1, score_from_rate2, ... score_from_rate100
this table will have 112 rows.
Then in a chart, use expression like this one:
IF ( $(mymetric) < ONLY ( score_from_rate1), 'rate 1'
,IF ( $(mymetric) < ONLY ( score_from_rate2), 'rate 2'
...
,IF ( $(mymetric) < ONLY ( score_from_rate100), 'rate 99'
,IF ( $(mymetric) < ONLY ( score_from_over_rate100), 'rate 100'
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) //close all 101 if-s
this works pretty fast - takes 4-5 seconds to recalculate my chart.
I am still not 100% happy as it was such a hassle and still looks ugly but it works.
I am going to accept your answer as correct one though - thanks for trying solve this!
Borys,
proszę
i think you may find something interesting in v1
So, aggregate formula for each dimension value as a row in formula table
Then using dimension number, with pick you may ask $(=fieldvalue('field',nbr))
Remember, that it will work nice while field has distinct values. So, it will be better to check it during reload. If 2 dimension values have the same score ranges and rating, you will have to have the same formula and dim number).
Your solution with 100 columns and 112 rows (associated with dimension) looks nice. But i think, that there will be better if you preaggregate data stronger. This may be also more complex script if you would like to prepare it generic (assuming, that number of your ratings /columns in your solution/ will grow up). There is of course generic load, but ....
We may also go into table with 112 columns: formula_1.formula_2, .... formula_112 and 1 row. Then pick(nbr, $(=only(formula_1)), ...,$(=only(formula_112))). It is also "generic" and may be easy prepared in the script.
poka
Darek
Borys,
1 more idea with your 100+1 columns and 112 rows table.
Columns are:
Dim, r1,r2,r3, ... , r100
In rn column you have n'th range from value.
Check how fast is expression:
-RangeSum(score>=r1,score>=r2,score>=r3,.......,score>=r100)
got it, very cool
sorry, Darek, not sure I understand this. RangeSum will give a sum of all boolean expressions and then what can i do with that?
Look,
in QV true=-1, false=0
simple example:
DIM,range,score_from,score_to
A,1, 0, 0.25
A,2, 0.26, 2.55
A,3, 2.56. 7.80
A,4, 7.81
our table:
DIM, r1, r2, r3, r4
A, 0, 0.26 , 2.56, 7.81
lets test:
score = 3
-rangesum(3>0, 3>0.26, 3>2.56, 3>7.81) = - rangesum(-1,-1,-1,0)=3
another test:
score=2.3
range=-rangesum(2.3>0,2.3>0.26,2.3>2.56, 2.3>7.81)=-rangesum(-1,-1,0,0)=2
this is a very interesting idea - I will think about it more tonight. Thanks for your help again!
Your range should be equal to number of score_from, which are less than calculated score.
Anyway, i'am not sure if this will be good for performence. It is always 11200 comparisions.
I think, that it may be better to prepare something like tree.
Let's asume range table (for one Dim value):
range, score_from, score_to
1, 0 , 5.05
2, 5.06 , 10.01
3, 10.02 , 25.04
4, 25.05 , 30.20
5, 30.21 , 50,54
.....
10, 101,33 , 110,43
11, 110,44 , 137,01
.....
20, 189.01 , 205,45
......
100, 567,99 ,
We may prepare 2 levels if (like tree)
if(score<=110.43, if(score>=0,1,if(score>=5.06,2,if(score>=10.02,3,if(score>=25.05,4,.......))))))))),if(score<=189.01,if(score>=110,44,11,........)))))))))),..........)))))))))))))))))))))))))))
So, first level looks in which tenth is result. 2'nd level checks only values in tenth found in 1'st level.
This way, if you have 100 ranges and 112 dim values, max number of comparisions will be 10x112 (to find tenth for each dim value)+10x112(to find range for each dim value) = 2240, and no summation needed.
With rangesum we have always 100x112 logical comparisions and next 112 summations.
regards
Darek
unfortunately performance is left to be desired so I will be open to any other suggestions.
It is one of these things which are very easy to do in Excel using vlookup function but QV cannot handle this well. I have hard time to explain this to my project stakeholder who happens to be an Excel guru