Solved: Where Are My AutoML Model Coefficients? - Qlik Community

shansen · ‎2024-08-08

Is there a way to get the direct model coefficients out of AutoML, just like typing model.coef_[i] as shown here: https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainab... ?

What about for categorical data? How do we know how much more/less a certain value in a categorical field will influence the outcome?

I see the output Shapley value files (Errors, SHAP, Coordinate SHAP), and I know that the sum of the Shapley values sums up to the difference between the baseline model output and the current model output. But, what if the effect of the feature values seems very inconsistent (A numerical value of 59 for feature A on one row has a high positive Shapley value, whereas the next row has a feature A value of 61 and has a negative Shapley value.) Is there a way to glean more insight as to why?

Kyle_Jourdan · ‎2024-08-26

The SHAP values in Qlik AutoML are based off estimated Shapley values (learn more here: https://christophm.github.io/interpretable-ml-book/shap.html).

Keep in mind as part of this calculation, estimated contributions are calculated row-by-row, so there may be another factor (feature) on that row that causes the SHAP value for the same feature to be different magnitudes and direction (negative or positive impact). For example, a certain price point may have a positive impact for customers in a certain region, but a negative impact for another region.

These are not related to model coefficients and are calculated after the model has made its predictions, so I would not suggest you use these to try to estimate any sort of model coefficient. In a good model, the sort order of the average absolute value of SHAP values for each feature should roughly match the sort order of the permutation importance of features in the model.

Finally, if you do not have a unique identifier for Qlik AutoML to use as the index column, you must select the “Apply dataset” option in the prediction configuration to get the correct index to match records back to their prediction and SHAP values. Using the RowNo() function is likely not correctly matching these records.

View solution in original post

igoralcantara · ‎2024-08-08

Hi @shansen , do you simply mean the row level SHAP value or the hyperparameters? If you talking about the HP, there is no way to retrieve that as data. My recommendation is to suggest that feature in Ideation:

Ideation | Qlik Community

IPC Global: ipc-global.com
Check out my latest posts at datavoyagers.net

shansen · ‎2024-08-08

I am talking about the row-level SHAP values, and their correlation with the feature values. I have a situation where I am trying to determine which values of a feature contribute to or detract from a positive outcome. I would expect the average SHAP value for a given numerical feature value to be a little more orderly than this. Have you ever seen numeric SHAP values vary so much along a numeric feature value?

igoralcantara · ‎2024-08-08

The coordinate SHAP values file gives you exactly that. It gives you the shape for each feature and each row. If you don’t see it, it is because the way your chart is presenting. Use a distribution chart with 2 dimensions where one of them is the feature name.

IPC Global: ipc-global.com
Check out my latest posts at datavoyagers.net

shansen · ‎2024-08-26

Igor,

Thanks for your help. When I used a linear model, the SHAP values of numerical features did give me a linear relationship that I could back-out a coefficient from (See attached Feature Values vs SHAP Values.png.). One troubleshooting issue I had to resolve first was to make sure I matched things up starting on row 0 instead of 1 to match with the Auto ML output:

LOAD
...
RowNo() - 1 AS my_row_index // The automl_row_index starts at 0.
FROM [lib://MySpace:DataFiles/outcome_prediction.csv]

Then things made much more sense.

Kyle_Jourdan · ‎2024-08-26

The SHAP values in Qlik AutoML are based off estimated Shapley values (learn more here: https://christophm.github.io/interpretable-ml-book/shap.html).

Keep in mind as part of this calculation, estimated contributions are calculated row-by-row, so there may be another factor (feature) on that row that causes the SHAP value for the same feature to be different magnitudes and direction (negative or positive impact). For example, a certain price point may have a positive impact for customers in a certain region, but a negative impact for another region.

These are not related to model coefficients and are calculated after the model has made its predictions, so I would not suggest you use these to try to estimate any sort of model coefficient. In a good model, the sort order of the average absolute value of SHAP values for each feature should roughly match the sort order of the permutation importance of features in the model.

Finally, if you do not have a unique identifier for Qlik AutoML to use as the index column, you must select the “Apply dataset” option in the prediction configuration to get the correct index to match records back to their prediction and SHAP values. Using the RowNo() function is likely not correctly matching these records.

Where Are My AutoML Model Coefficients?

Introduction to Machine Learning