In a previous article (Qlik AutoML: Overview of SHAP values), I described Shapley values and what they represent for AutoML models. These additional measures at the record level can be helpful to enhance data visualizations and understand a model further.
A helpful trick for using Shapley values in Qlik Sense sheets is to generate the Coordinate Shapley table. This table turns the feature to a record level measure. Thus providing additional methods for aggregating and exploring the values.
In the steps below, I'll outline how to generate this table. I'll be using the HR Analytics: Promotion Data. This dataset has series of features used to predict whether or not an employee will be promoted.
Steps
- Upload train.csv and test.csv from HR Analytics data to Qlik Catalog.
- Create a new ML Experiment and select train.csv as the training dataset. Set "is_promoted" to the target variable and unselect "employee_id" as a feature. Run the ML experiment.
- Click on 'Deploy' and then follow the link at the top of the webpage which will take you to your deployed model within the "ML Model Management" interface.
- Within "Deployment Overview", select "Create Prediction".
- For the apply dataset, select test.csv from Qlik Catalog.
- Under Prediction Options, include "Coordinate SHAP" table and set "employee_id" as your index column. Then click 'Save and Predict Now.'
- Within Catalog, you will now see the prediction tables and Coordinate SHAP table available.
If you open up the Coordinate_SHAP table in Catalog, you can see the different structure than the SHAP table.
- Open a new Sense App and upload the prediction tables generated in step #7.
I applied all recommended associations, so the tables are linked by "employee_id."
- Add a new horizontal bar chart to a sheet with 'automl_feature' as the dimension and the average 'SHAP_value' as the measure. This way you can review whether the feature influence the predicted value in a positive or negative way. Additionally, you can add filter dimensions to compare different groups within the dataset. In the example below, we compare 'R&D' vs 'Finance' Department.
Environment
Qlik AutoML
The information in this article is provided as-is and to be used at own discretion. Depending on tool(s) used, customization(s), and/or other factors ongoing support on the solution below may not be provided by Qlik Support.