Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Top 10 Clients Per Product based on 130 million unique combinations

Hello All,

I hope you are having a good day.

So, the time has come to deal with the large data sets in QlikView (at least for me ). The requirement is to show in a straight table top 10 clients per each product. Since there are 130M product-client unique combinations (although 30 products only) in the data model, I'm dealing with the poor performance. The expression is a simple Sum(Amount) calculation. The server machine the QlikView Server is running on is equipped with 40 CPU cores (4 processors) and 512GB RAM. There is no way to move the calculations into the load script as the functionality has to react on selections.

Would you advise the best method to handle situations like that, please?

Best regards,

Janusz

6 Replies
swuehl
MVP
MVP

Have you tried Client as dimension and as expression

=Sum({<Client = {"=Rank(Sum(Amount))<=10"}>} Amount)

?

Anonymous
Not applicable
Author

Hi swuehl

Thanks for the prompt reply.

The expression you suggested returns top clients within the current selections and I'm looking to get top clients per each product. Please, see attached example app.

Many thanks in advance for your help.

Best regards,

Janusz

Anonymous
Not applicable
Author

Apologies for the direct approach. rwunderlich‌, troyansky‌, BarryHarmsen‌ would you have a minute to help, please?

Many thanks in advance for your help.

Best regards,

Janusz

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Anonymous
Not applicable
Author

Hello Oleg,

Thanks a mill for getting back to me.

I've read this article before, but after reading this section:

"Notice that this solution cannot be used for the previous problem, where we wanted to show Top 5 Items per Brand. The reason for that is the well-known limitation of Set Analysis – it only gets calculated once per chart, not once per cell, and hence it cannot be sensitive to individual Dimension values. Hence, we can’t use Set Analysis to calculate the Top Items per Brand (i.e. per Dimension value)."

I thought I'm stuck with the Dimension Limits or Expressions Limits solutions which are not performing well.

So, I decided to give it one more chance and I believe I found a solution (please see attached file).

I created an additional field that combines ClientName and ProductName and came up with the following expression:

Sum({<

  Client_Product = {"=

       Max(

            Aggr(

                 If (Rank(Sum(Amount),4) <= $(vTopN),

                      1

                 ),

                 ClientName, ProductName

            )

        )

  "}

  >} Amount

)

It seems to work, but since it is in contrary to what is written in the article, much appreciated if you would test this solution.

Best regards,

Janusz

swuehl
MVP
MVP

Janusz,

your solution is not contradicting what is written in the article. You create a set based on a key field Client_Product, where each value is already unique for any combination of your chart dimensions, ClientName and ProductName, by its definition. Thus the evaluation does not need to be sensitive to your individual chart dimensions, in a sense that it needs to be aware of the dimension values when the evaluation occurs.

What you've done is indeed a possible solution.

It's more a question about the performance of this solution. Basically, you transferred the work done in a calculated dimension to the advanced search of the set expression, with potential additional overhead of creating a key field with millions of distinct values in the script and the overhead of a set analysis using such a large key field.

Looking at your sample, the QVW almost doubled the size, the chart with the set expression is lot larger than the chart with the dimension limits and the calculation time seems also quite larger.

Nevertheless, this might depend on the specifics of your sample data to some extent, so it's probably a good idea to test with your real data.

I would assume that this will also show a performance not as good as with the dimension limits, though.

Regards,

Stefan