Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am creating a pie chart to show the share of a population that is either Male or Female.
The breakdown is 23.03% female, 42.16% male and 34.81% null.
When I create the pie chart, I use ' Gender' as the dimension and =count(acctid) as the measure. When I do this, the values above show correctly, however I do not want to include nulls. When I un-select 'Include Null Values' under the dimension, the Null values go away, however the percentages are then inflated.
How can I correct this?
It's an "under the bonnet" consideration. If the engine does a count it has to look at every value in a virtual table containing every acctid, looking to see if it is null or not and then counting them. If you have the number 1 on every row it will create a virtual table with only the number 1 in and it will keep tabs of how many there are. Adding up those 1s is then much quicker.
The code to do this will be:
LOAD
1 as PopCount,
acctid,
if(IsNull(Gender), 'Not Known', Gender) as Gender,
if(IsNull(Gender), 0, 1) as GenderKnown,
... rest of load ...
As you can see, you can also have other binary flags, which can be used for set analysis or other purposes.
For example if you wanted to know the proportion of Male or Female when you have a value you could do sum(GenderKnown) or for the whole cohort you could use sum(PopCount).
Hope that makes sense?
On a pie chart the wedges must always total to 100%, so a pie chart with two wedges that total 65% is not a valid visualisation.
Your best bet is to replace the null values with 'Not specified' or similar. You can do this in the load script:
LOAD
if(IsNull(Gender), 'Not Specified', Gender) as Gender,
etc.
It's always a good idea to replace nulls in the load script, as they can throw out calculations.
Also, it's a good idea to add a counter to your load (1 as PopCount,) so you can do sum type calculations (sum(PopCount)) rather than count expressions, as these are more efficient.
Thank you,understood on the pie chart. Can you please elaborate on the PopCount expression?
It's an "under the bonnet" consideration. If the engine does a count it has to look at every value in a virtual table containing every acctid, looking to see if it is null or not and then counting them. If you have the number 1 on every row it will create a virtual table with only the number 1 in and it will keep tabs of how many there are. Adding up those 1s is then much quicker.
The code to do this will be:
LOAD
1 as PopCount,
acctid,
if(IsNull(Gender), 'Not Known', Gender) as Gender,
if(IsNull(Gender), 0, 1) as GenderKnown,
... rest of load ...
As you can see, you can also have other binary flags, which can be used for set analysis or other purposes.
For example if you wanted to know the proportion of Male or Female when you have a value you could do sum(GenderKnown) or for the whole cohort you could use sum(PopCount).
Hope that makes sense?