I am doing some analysis of some search logs, where the number of unique terms is very large (i.e. approx 150,000 over a single day). I'd like to be able to display a table containing the term, the number of times it was entered (sorted in descending order), and the running total of the % of the total queries. e.g.
term total running %
cat 23,122 1.21
dog 20,000 2.45
fish 10,000 2.54
rabbit 10,000 3.32
hippo 345 99.43
monkey 12 100.00
This is so we can see that say the top 20 queries (although individually high) only account for say 10% of the total queries.
I have no problem with the expression to get the running % (using rangesum() and above()), and it works fine on a small set of data (50 terms). However, when it is ramped up to a significant number of terms (e.g. 150,000), qlikview just displays the 'calculating' view for the table and never finishes (I have to kill qlikview).
I am being unreasonable to expect it to do a running total for this volume of data (in reality the total number of unique terms for a month will be a lot more than 150,000), and is there an alternative way to establish how many queries make up the top 10 / 20 / 30 % of the total?