Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
newuser
Creator II
Creator II

Sankey Diagram Count Distinct > 100%

I've seen a few posts mentioning Sankey diagrams resulting in more than 100%. Has anyone encountered this, and if so, what is the reason it happens? We've found depending on the dimensions used, the %'s change, e.g., using two vs. three dimensions in the Sankey can vary the parent %'s shown. We are counting distinct and then dividing by the "total".

Labels (1)
3 Replies
stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi @newuser 

This would happen if the values that you are getting a distinct count of appear in more than one of the dimension values. If you picture having the following table:

Dim1 Dim2
A Z
B Z

 

Then if you count distinct Dim2 within Dim1 you would have:

Dim1 Count
A 1
B 1
TOTAL 1

 

And if you then view the dimension values as percentages of the total:

Dim1 Count
A 100%
B 100%
SUM 200%

 

That's obviously an extreme example, but if you have many Dim2 values across Dim1 and a few of them appear in more than one Dim1 values you could end up with a sankey total over 100%.

If you count distinct a composite value of Dim1 & '-' & Dim2 then that would remove that risk of going over 100%, as a duplicate value would double count in the total, as well as against each dimension.

Hope that makes sense.

Cheers,
Steve

 

newuser
Creator II
Creator II
Author

Thank you @stevedark ! When using the composite field, would that preclude me from using multiple separate dimensions in the Sankey? In my case, I do have instances where multiple Dim2's reside in a Dim1.

I'd still like to use Dim2, Dim1 as separate dimensions in the Sankey, so I am assuming your composite key suggestion would still allow for that?

Would the composite field be in the highest-level parent field only (left-most field) or any parent field needing child fields grouped (if that makes sense)?

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi @newuser 

You don't need to create it as a field, or add it to a dimension, just reference the composite value in the Measure of the sankey chart:

count(DISTINCT Dim1 & '-' & Dim2)

If both Dim1 and Dim2 exist in the same source table then you can make your app more efficient by concatenating on load to a new field that you can reference in the count.