Several years ago, I blogged about how creating a synthetic dimension using ValueList allowed us to color dimensions in a chart. ValueList is commonly used where there is not a dimension in the data model to use, thus creating a synthetic one with ValueList. You can read more about ValueList in myprevious blog post. In this blog post, I am going to share how I used ValueList to handle omitted dimension values in a chart.
I recently ran into a scenario when creating visualizations based on survey data. In the survey, the participant was asked for their age as well as their age group. The ages were grouped into the following buckets:
Under 18
18-24
25-34
35-44
45-54
55-69
70+
Once I loaded the data, I realized that there were not participants for all the age groups, so my chart looked like the bar chart below. There was a bar and value for only the age groups that the participants fit in.
While I could leave the chart like this, I wanted to show all the age group buckets in the chart so that it was evident that there were no participants (0%) in the other age group buckets. In this example, the four age groups were consecutive, so it did not look odd to leave the chart as is but imagine if there were no participants in the 45-54 age bucket. The chart may look odd with the gap between 44 and 55.
I explored various ways to handle this. One way was to add rows to the respective table for the missing age group. This worked fine but I was not a fan of adding rows to the survey table that were not related to a specific participant. The option that I settled on was using ValueList to add the omitted age groups. While this option works well, it can lead to lengthy expressions for the measures. In this example, there were only seven age group buckets so it was manageable but if you had many dimensions values then it may not be the best option.
To update the bar chart using ValueList, I changed the dimension from
To
Then I changed the measure from
To
Using ValueList in the dimension created a synthetic dimension with each age group option that was included in the survey. Now I will see all the age buckets in the chart even if there were no participants that fell in the age group bucket. Since I am using ValueList for the dimension, I need to update the measure to use it as well. This is where a single line measure can become a lengthier measure because I need to create a measure for every value in the synthetic dimension, thus the nested if statement above. The result looks like this:
There are no gaps in the age buckets, and we can see all the age bucket options that were presented in the survey. I prefer this chart over the first bar chart I shared because I have a better understanding of the survey responses presented to the participants as well as the response they provided. I would be interested in hearing how others have handled similar scenarios.
I have found that certain problems are best solved with a synthetic dimension like value list that are not easily done in the data model or many cases, where it should not be done in the data.
In other case, like age grouping, it is typically best to shift it into the data model using a function like. intervalMatch(). The benefit in this situation is that you get better app performance and a dimension and measure expression that are easier to develop and to maintain.
That said, being able to implement this type of functionality using a synthetic dimension can help you prototype the functionality and it's benefits then after exploring and tuning it, you can prove it out and move it into the data model. So, synthetic dimensions can be an effective tool for prototyping and validating before implementing it in the data model.
Personally I don't use valuelist() and valueloop() for such scenarios - else only then if lists/ranges are depending on the selection state.
If it are fixed values it's easier and more powerful to use an island-table instead of the valuelist() - maybe just created with a small inline-load or loaded from an external source and especially by larger and/or more complex lists the values could be created with all sophisticated load-features instead of hard-coding them.
Another benefit by using an island-table will be that they isn't restricted to a single field else several ones might be included, for example to apply further categorizations or just a sorting-information.
In the end it exists a native dimension with options to apply also aggr() or TOTAL statements, the expressions will be more readable and the use and overview of the selection state is better.
Beside of all the above I suggest not to discard the consideration of populating the missing data too fast. The efforts are often not very expensive and avoids all the trouble with the island-table / synthetic dimension stuff - and could provide further respectively easier insights which data exists and which not - just by adding the information of 'native' or 'populated' within an extra field.
I'm on team fill-in-the-dimension. If you want all of the values to be available (even if no data exists), it's easy enough to Load DimName from DimTable. If there is no DimTable, you can use an Inline Load.
The use case I usually have for ValueList() are different types of measures in an expression (in a P&L for example, with percentages and currencies combined in one column).
Instead of using an IF(), you should probably use PICK(MATCH()). I would probably go for Rob's suggestion in your use case.
This topic appears to generate a lot of interest, so I'll add my two cents here.
The consensus among the experts is to solve this problem in the data model, and I certainly agree with that. One important distinction I'd make is to try and avoid a data island solution, especially if the data volume might be big (more than a few millions of rows.) Data islands can slow down your app's performance substantially for large data volumes. (Refer to my Performance Tuning lecture at the Masters Summit for Qlik). It wouldn't be too hard to create a dimensional table of Age Groups and associate those with PIDs without using an island.
In rare occasions when an issue like this cannot be resolved in the data model, I'd probably opt for a calculated dimension that would associate a calculated "age group" with each Age (or Customer, or Person, etc...) Something along these lines:
AGGR( IF( Age<18, Dual('Under 18', 18),
IF(Age < 25, Dual('18-24', 24), ... ))))), Age)
This calculated dimension associates the calculated age group with each Age (assuming that there is a data field that contains the age for each PID). Alternatively, ages could be calculated and age groups could be associated directly with each PID, however that would probably be heavier from the performance standpoint.
I use the Dual function to ensure that age groups are presented in their numerical order rather than in alphabetical order.
With a dimension like this, the measure can remain simple, just as it was in the initial example:
To conclude, I think it's awesome that Qlik allows so many different ways to "skin a cat(fish)", and we can all pick our favorite ways and discuss them in this forum.
Tough to say anything against so grounded comments as the ones from Rob, Oleg or Marcus... Still, if no initial post , no opportunity for such relevant comments.