I'm currently working with a large dataset (about 500 million records).
Basically, my data structure is as follows. I have one table that is like this:
ContractID | Dimension1 | Dimension2 | Year | Loss |
1 | A | A | 1 | 2,791 |
1 | A | B | 1 | 8,489 |
1 | A | C | 1 | 7,619 |
1 | B | A | 1 | 4,370 |
1 | B | B | 1 | 6,024 |
1 | B | C | 1 | 4,551 |
2 | A | A | 1 | 3,119 |
2 | A | B | 1 | 1,828 |
2 | A | C | 1 | 6,752 |
2 | B | A | 1 | 9,462 |
2 | B | B | 1 | 2,202 |
2 | B | C | 1 | 5,117 |
The only difference is that instead of only 1 year, there are 10,000 years for each combination of dimensions.
I need to rank the losses by year and pull out certain ranks. I also need to allow filters to be applied on the dimensions and recalculate the ranks. I need the end result to look like this (2 charts below):
ContractID | Rank1 | Rank10 | Rank100 | Rank250 |
1 | 10,000 | 8,000 | 6,000 | 4,000 |
2 | 15,000 | 12,000 | 5,000 | 100 |
3 | 13,000 | 9,000 | 7,500 | 5,000 |
| Rank1 | Rank10 | Rank100 | Rank250 |
Total | 15,000 | 12,000 | 11,000 | 10,000 |
Currently, I’m using different expressions for each Rank column, that look something like this (for the total chart):
sum(aggr(if(rank(sum(Loss),4)=[Insert appropriate rank here],sum(Loss)),Year))
I expand the aggr for other dimensions if I want it split out, and add those as dimensions in my chart, obviously.
I know that using aggr functions and rank functions are not the best way to go with this. Any ideas on a better way to do it?