Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Calculating ratio from two data sets

I have two tables with sample data:

Table1: 8 million records
Product Number
YearMonth
Revenue
Product Type

Table2: 600,000 records
Product Number
YearMonth
Warranty Cost
Warranty Code

Calculation Required: Warranty Cost / Revenue ratio for Product by Month/Year (in graph) with Product Type/Warranty Code as user selections.

Issues:
- The Chart calculation is slow - 15 second chart refresh on new Blade Server with 8 cores and 64Gb RAM
- There are Product Number entries in Table 2 that don't exist in Table 1 - result of system migration and updates.
- More Product and Warranty attributes to be added to above
- Potentially more records to both tables

At present, the solution utilises a Calendar Island whereby the ratio calculation is:
=sum(if(Table2.YearMonth = Calendar.YearMonth, [Warranty Cost])) / sum(if(Table1.YearMonth = Calendar.YearMonth, [Revenue]))

Questions:
1) Is the use of the Calendar Island best practice?
2) Are there ideas out there to increase efficiency of calculation, therefore users' experience of QV?

9 Replies
matt_crowther
Luminary Alumni

8 Million rows should be no issue for a server of that spec so you should be able to sort this with not too much fuss.

The best way is to replace those 'IF' functions with Set Analysis - have alook in the help file as it will explain it far better than I can - Set Analysis is a much more efficient way of running calculations when looking at larger datasets (amoungst other uses); I've had examples where using IF statements charts were taking 3mins+ but with SA were instant.

IF functions require the whole dataset to be checked as to whether they meet the condition and as such are very slow when dealing with larger volumes of data.

As for you bad data; I'd look to use a Where clause in your load script in conjunction with the exists() function to only load associated data.

Hope that helps,

Matt - Visual Analytics Ltd

Not applicable
Author

As suggested, I have played around with the SetAnalysis and utilised the following to calculate Annual Sales in a pivot table:

sum({$ <Cal_YYYY = Trans_YYYY>} Sales)

The expression returns correct values where CAL_YYYY is being filtered, otherwise, it returns blank when no selection is made on CAL_YYYY.

Have I missed something?

Oleg_Troyansky
Partner Ambassador/MVP

In your case, the best solution would be to avoid Date Island, if possible, and to link the calendar directly, in order to leverage QlikView associative logic.

To fix your Set Analysis expression above, you can use something like this:

sum({$ <Cal_YYYY = P(Trans_YYYY)>} Sales)

Another problem is the fact that you need fields from 2 tables for a single calculation. QlikView needs to match all possible values from one table with all possible values from the other table, prior to calculating the ratio.

As irrational as it sounds, I believe that loading both tables into one (concatenating) could improve performance, because the two parts of the formula would now come from the same table, therefore eliminating the need in resolving the link (or two?) in the run-time. By the way, concatenating those tables into one can also help you eliminate the Date Island, because now you can name the dates with the same Field Name.

Not applicable
Author

Taking on board the suggestion of concatenating into one table, the fields would look as follows:

Product Year Sales WarrantyCost WarrantyCause
XXX 2000 100
XXX 2001 200
XXX 2000 20 AAA
XXX 2000 40 BBB

If AAA was selected from WarrantyCause, WarrantyCost would be 20 for Product/Year Combination. What is the expression to return the Sales of 100 for that Product/Year Combination to calculate WarrantyCost/Sales ratio of 20%?

In addition, if both AAA and BBB was selected, I only need to return 100 for Sales, therefore the WarrantyCost/Sales ratio should be 60%

Oleg_Troyansky
Partner Ambassador/MVP

well, this is a bit more complex, and in this case, concatenation might not be the best solution...

Or, if it's still the "least of all evils", you might be able to circumvent the problem by a Set Analysis formula that would disregard the selection of Warranty Cause for the sum(Sales) - something like this:

sum(WarrantyCost) / sum( {<WarrantyCause=>} Sales)

cheers,

Not applicable
Author

If I was to revert to using 2 tables without a calendar island - 1 for sales and the other for Warranty Cost, such that:

Table 1:
Product
Year
Sales

Table 2
Product
Year
Warranty Cost
Warranty Cause

The link between the 2 tables would be Product (can't use Year as there are multiple definitions of Year).

How would I be able to relate the Year fields above, without including them in a join? The purpose is to calculate the WarrantyCost / Sales ratio for each Year, with the ability to filter by WarrantyCause.

Thanks for your assistance so far.

Oleg_Troyansky
Partner Ambassador/MVP

I'm not sure what do you mean by "multiple definitions of Year"...

If you need to compare warranty costs to Sales per year, you need to link by Product and Year. This can be done in 2 ways:

1. Let QlikView generate a Synthetic Key that will combine Product and Year together. I'm not a big fan of those, but there is an opinion that it's OK to have a simple synthetic key for a couple of fields.

2. You can generate a composite key by yourself and create what's called a LinkTable (look it up in the forum, there are many descriptions of how to do it). In the case, you control the process of creating the composite key, instead of QlikView.

Not applicable
Author

Multiple definitions: In the warranty fact table, there exist date definitions for Date(Accident) and Date(ClaimMade) and others....

Therefore creating one link with Date dimenision between Sales and Warranty fact tables would inhibit different calculations of dates, limiting effectiveness of Calendar Islands, link tables etc.

In essence, if the same calculation was to be done in Access for example, one would use nested queries where the first query would return WarrantCost by (Year,Month) based on selection of filters, then using this set of ProductNumbers to interrogate the Sales Fact table to return the sales for those Products by (Year/Month). I am just trying to replicate this nested query within QV.

Not applicable
Author

Just reflecting, the Calendar Island option would the answer with the exception that my fact tables are huge (8+m rows), therefore diminishes user experience with snappiness of QV. Have gone down this path and this solution is not acceptable.

Any other ideas?