Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have been asked to calculate two percentages from two different, but related groups of data.
Group 1 is Actual Values - this is the actual population of units sold and has the following structure:
Date
Product
Group_Count - this is the number of units for a given product for a given date
Population_Count - this is the total number of units (of all products)
With this group they want to calculate what percentage of the actual install base is using a specific product or group of products.
This is easily done on its own, the problem comes in when we add group 2.
Group 2 is Sample values we get a periodic sample of units and their attributes. It has the following structure:
Date
Product
Attributes 1-4
Sample_Count
Population_Sample
They want to use the sample percentages to calculate what percentage of the total populate have a particular configuration (i.e., based on the four attributes).
The problem is that when I pull in the second set of data, my original set of data gets exploded as for a single product, there may be 10-50 (or more) configurations as each of the attributes has multiple options - the smallest has four options and the largest attribute has about 20 values.
How can I calculate my summary values (actual values) without the interference caused by the secondary group?
It can mostly be solved for the Population_Count by Sum(Distinct value). I say works mostly because nearly all the values are different but we do have a few instances where there has been no change in population between dates. However, for the Actual Group_Count field, that gets exploded by all the permutations of the attributes.
Any thoughts or ideas?
Thanks!
Follow on question as a way to perhaps address the issue.
Is there a way to add a dynamic variable to the data set to indicate this is the first time a given date / product family / product combination has been encountered and then I can set an inline filter to only tally those values that are so flagged? It would have to be dynamic as they will have multiple filter options to set.
Just a thought.
Thanks.
You could use some aggregation functions to focus the computation. Or add a summary table to your data model so that you can have a data model that reports numeric values at different levels of granularity.
A sample QVW would help fine tune the approach if you are able to.
Thank you Jonathan,
I would love to submit a sample QVW but would need to do some serious data masking first.
Can you explain the summary table concept? I tried having two queries - one that was a summary table and one that had the more detailed information. However, when I had a pivot that did the calculations and pulled the data, it still got exploded and I had the same problem.
How do I use a summary table in the data model and force the calculation to use that data with no reference to the other data?
Thanks.
So this MIGHT work: take the 2 common dimensions and build a lookup table with a new key for each unique pairing. You can do this by loading the 2 fields + calculated key from 1 source and then concatenating a 2nd load from the 2nd source with the same fields.
Then load key again with the remaining fields into 2 'fact tables' . Dates can be finicky to get consistent so there may be some manipulation needed to ensure you have a standard date dimension with key.
but once you have this you shouldn't get 'blown up' values for each dimension.
If this doesn't make sense or doens't work. consider sharing two excel files with like 10 records each + a qvw on top that is enough to illustrate the issue.
CommonDimensions:
Date
Product
Date&Product as DateProductKey
Dataset1:
DateProductKey
Group_Count
Population_Count
Dataset2:
DateProductKey
Attributes 1-4
Sample_Count
Population_Sample
Thank you Jonathan. I put some work this morning into creating two spreadsheets as you suggested as well as a sample QVW.
There is a summary and a detail spreadsheet and then the sample QVW.
In the QVW, I show two options for creating the total population total and pct total and while the second method works in this limited data set, in the full data set it does not as some of the values are identical (i.e., I'm using a Distinct).
Again, we have actual population for products for a given week based on the product family and actual product. Also for a given week we have sample data that include three attributes (features selected with the product).
There are two calculations that need to occur:
1. The percentage of the total actual based on the current selections
2. The percentage of the total sample based on the current selections
They want to be able to extrapolate that if 15% of the sample population for product A have a specific set of attributes, then they can extrapolate the number in the actual population by taking the actual group valuen * the % of the sample population.
For example, for 2014-07-25, Product Family 7, Product 7A is 69% of the actual population. If I also select 75 for Attr 3, I see that it is 9% of the 7A for 2014-07-25 sample population. Therefore the actual number of 7A, attr3=75 for that week should be 21,099 * .09443 = 1,992
I'm sure there is a better way to go about this than what I am doing and, as a relatively new QV user, welcome the opportunity to learn.
Thanks.