Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Data Set Question

I have been asked to calculate two percentages from two different, but related groups of data.

Group 1 is Actual Values - this is the actual population of units sold and has the following structure:

Date

Product

Group_Count - this is the number of units for a given product for a given date

Population_Count - this is the total number of units (of all products)

With this group they want to calculate what percentage of the actual install base is using a specific product or group of products.

This is easily done on its own, the problem comes in when we add group 2.

Group 2 is Sample values we get a periodic sample of units and their attributes.  It has the following structure:

Date

Product

Attributes 1-4

Sample_Count

Population_Sample

They want to use the sample percentages to calculate what percentage of the total populate have a particular configuration (i.e., based on the four attributes).

The problem is that when I pull in the second set of data, my original set of data gets exploded as for a single product, there may be 10-50 (or more) configurations as each of the attributes has multiple options - the smallest has four options and the largest attribute has about 20 values.

How can I calculate my summary values (actual values) without the interference caused by the secondary group?

It can mostly be solved for the Population_Count by Sum(Distinct value).  I say works mostly because nearly all the values are different but we do have a few instances where there has been no change in population between dates.  However, for the Actual Group_Count field, that gets exploded by all the permutations of the attributes.

Any thoughts or ideas?

Thanks!

5 Replies
Not applicable
Author

Follow on question as a way to perhaps address the issue.

Is there a way to add a dynamic variable to the data set to indicate this is the first time a given date / product family / product combination has been encountered and then I can set an inline filter to only tally those values that are so flagged?  It would have to be dynamic as they will have multiple filter options to set.

Just a thought.

Thanks.

JonnyPoole
Employee
Employee

You could use some aggregation functions to focus the computation. Or add a summary table to your data model so that you can have a data model that reports  numeric values at different levels of granularity.

A sample QVW would help fine tune the approach if you are able to.

Not applicable
Author

Thank you Jonathan,

I would love to submit a sample QVW but would need to do some serious data masking first.

Can you explain the summary table concept?  I tried having two queries - one that was a summary table and one that had the more detailed information.  However, when I had a pivot that did the calculations and pulled the data, it still got exploded and I had the same problem.

How do I use a summary table in the data model and force the calculation to use that data with no reference to the other data?

Thanks.

JonnyPoole
Employee
Employee

So this MIGHT work:  take the 2 common dimensions and build a lookup table with a new key for each unique pairing. You can do this by loading the 2 fields + calculated key from 1 source and then concatenating a 2nd load from the 2nd source with the same fields.

Then load key again with the remaining fields into 2 'fact tables' .    Dates can be finicky to get consistent so there may be some manipulation needed to ensure you have a standard date dimension with key.

but once you have this you shouldn't get 'blown up' values for each dimension.

If this doesn't make sense or doens't work. consider sharing two excel files with like 10 records each + a qvw on top that is enough to illustrate the issue.

CommonDimensions:

Date

Product

Date&Product as DateProductKey

Dataset1:

DateProductKey

Group_Count

Population_Count

Dataset2:

DateProductKey

Attributes 1-4

Sample_Count

Population_Sample

Not applicable
Author

Thank you Jonathan.  I put some work this morning into creating two spreadsheets as you suggested as well as a sample QVW.

There is a summary and a detail spreadsheet and then the sample QVW.

In the QVW, I show two options for creating the total population total and pct total and while the second method works in this limited data set, in the full data set it does not as some of the values are identical (i.e., I'm using a Distinct).

Again, we have actual population for products for a given week based on the product family and actual product.  Also for a given week we have sample data that include three attributes (features selected with the product).

There are two calculations that need to occur:

1. The percentage of the total actual based on the current selections

2. The percentage of the total sample based on the current selections

They want to be able to extrapolate that if 15% of the sample population for product A have a specific set of attributes, then they can extrapolate the number in the actual population by taking the actual group valuen * the % of the sample population.

For example, for 2014-07-25, Product Family 7, Product 7A is 69% of the actual population.  If I also select 75 for Attr 3, I see that it is 9% of the 7A for 2014-07-25 sample population.  Therefore the actual number of 7A, attr3=75 for that week should be 21,099 * .09443 = 1,992

I'm sure there is a better way to go about this than what I am doing and, as a relatively new QV user, welcome the opportunity to learn.

Thanks.