Skip to main content
Announcements
Happy New Year! Cheers to another year of collaboration, connections and success.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Filtering with a complex cpu-eating condition

Hi,

I think this one is a bit tricky... I got a table with company "locations" inlcuding GPS coordinates. And another (joint) table with "departments". And another (joint) table with "sales_amonunt" for "product_id". I have an input box where you can specify the current location (GPS coords) and another one to specifiy a radius.

I want to display a result list with ...

"locations" inside the specified radius only (I do know you to calculate it but it is very time consuming becuase it contains a lot of trigonometry)
with a llist/table with the best sold products (by sorted decending by amount - top 10 only!)
with the sum for each product at that location

Main dimension is the product_id, second is location, third is sales_amount. distance is a formula, as is sum (sales_amount). The task is to filter by distance, sort and top10 by amount -- and do all that quickly. (see further down)

Product A -- Location 3 -- 350 $ -- 90 miles -- 600 $
Product A -- Location 1 -- 150 $ -- 30 miles -- 600 $
Product A -- Location 6 -- 100 $ -- 10 miles -- 600 $
Product F -- Location 5 -- 300 $ -- 75 miles -- 550 $
Product F -- Location 3 -- 150 $ -- 90 miles -- 550 $
Product F -- Location 2 -- 100 $ -- 50 miles -- 550 $
Product D -- Location 1 -- 200 $ -- 30 miles -- 400 $
Product D-- Location 2 -- 100 $ -- 50 miles -- 400 $
Product D -- Location 4 -- 100 $ -- 60 miles -- 400 $
...

Of course a pivot would be more elegant but I noticed that it takes a huge amount of mem in my case.

I have tried doing an =if (cpu_eating_distance_formula < Radius, Location) as a dimension, but it takes ages because I have 6000+ locations and about 20.000.000 of "product-sales_amount" records.

I feel I need to do this in two or more steps rather than doing it with sophisticated dimensions or formulas: Create a table with the locations programmatically, create a top10prodcut list and the create the results lists. Any kind of help is very welcome.

Best regards,
Georg

4 Replies
johnw
Champion III
Champion III

Solved it, I think. Or at least it's running in less than a second on my machine with 10,000 locations and 10,000,000 sales records. I used similar to what you were thinking, a table with just the locations and their coordinates. Then you use set analysis to "select" only those locations within the specified radius. As long as your radius is reasonable, that DRASTICALLY cuts down on the number of sales records it needs to look through to find the highest sales amounts. It first scans the 10,000 locations, applying the distance formula to find, say, the four locations that are in range. At that point, there are probably only 4,000 or so matching sales records. It then scans through these to determine the sales amounts, and displays the top ten. Hopefully it then only reapplies the distance formula to the top ten for the final display, but even if it applies it to the 4,000, it won't be that bad.

See attached. I've set it to 1000 sales records so that it's not a giant file. Just bump it up to 10,000,000 or so to see more realistic performance.

Not applicable
Author

John,

many thanks for that superb example. I never thought that an alaysis filter would be faster tha restricting the dimension. Do you know why that is?

First I thought the distance filter was not working -- but then I noticed, that the table simply isn't updated if you enter a new radius. You need to Move the result table (drag it) to see the result. Is there anything we can do to remedy this? Maybe with a triggered action?

And can we change the result so that the top 10 sales with its locations are listed? I cannot see how to add a "maximum results" restriction on a dimension.

Best regards,

Georg

johnw
Champion III
Champion III


gbrandt wrote:I never thought that an alaysis filter would be faster tha restricting the dimension. Do you know why that is?


I have a good guess why that is - lazy coding by QlikTech. Smile A calculated dimension can be almost anything. It doesn't HAVE to be like your example, where you are simply trying to restrict the values of an existing dimension. Since the calculated dimension can be anything, they don't check for this situation, and so go through a very different path internally than making selections. A very slow path of some sort, as calculated dimensions are notoriously slow. Set analysis, on the other hand, must ALWAYS be specifying restrictions on allowed values. So set analysis can always be implemented internally by going through their normal associative logic (same as manually making selections), which makes it VERY fast at doing what, in your case, is the exact same thing. Now, if QlikTech ever gets around to it, they could analyze the expression you used for the calculated dimension. Probably the majority of times when I'm using one, I'm just restricting what values are allowed. In a case like that, they could internally process it exactly the same as selections, exactly the same as set analysis, and it should be very fast. But they haven't done that. Perhaps I should suggest it more formally. Hmmm.


gbrandt wrote:First I thought the distance filter was not working -- but then I noticed, that the table simply isn't updated if you enter a new radius. You need to Move the result table (drag it) to see the result. Is there anything we can do to remedy this? Maybe with a triggered action?


Ugh, yeah. I noticed that while I was working on it, then totally forgot about it by the time I posted. It's a bug, and I ought to see if it's been reported and report it if not. I've seen similar bugs before - QlikView sometimes just doesn't realize that one thing can affect another thing, and doesn't update charts as a result. As far as remedies go, I guess you'd have to have some way of making the variable change then change something ELSE that was used in the chart. For instance, I bet the chart would update if it were a field instead of a variable. So you could have them select from a specific list of radii in a list, which would also make sure they don't crash out the system by choosing 50,000 km as the radius. Or if you want to let them enter it as a variable, trigger an action to update a hidden field with all of the radii in it. I'm guessing either would force the chart to update.


gbrandt wrote:And can we change the result so that the top 10 sales with its locations are listed? I cannot see how to add a "maximum results" restriction on a dimension.


It's already there. Presentation tab, max number 10. At least if I understood what you want. If you sort by the Amount column, you'll see only the top 10 amounts. If you sort by some other column, you'll see the top 10 of that column. If you want to make sure it's always by Amount, you can uncheck "allow interactive sort" on the sort tab.

Not applicable
Author

Thanks for all that valuable information!

Regarding the top 10 feature. My final setup differs from your suggested solution because I need a pivot table. And I cannot find a way to make a top 10 selection for the pivot table.
I can only sort it ...

Dimension "product_type" sort function
=



sum({< Location = selectedLocation >} sales_amount)

How can I apply a top 10 based on this calulation?

Best regards,
Georg