Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Is there a way to get the direct model coefficients out of AutoML, just like typing model.coef_[i] as shown here: https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainab... ?
What about for categorical data? How do we know how much more/less a certain value in a categorical field will influence the outcome?
I see the output Shapley value files (Errors, SHAP, Coordinate SHAP), and I know that the sum of the Shapley values sums up to the difference between the baseline model output and the current model output. But, what if the effect of the feature values seems very inconsistent (A numerical value of 59 for feature A on one row has a high positive Shapley value, whereas the next row has a feature A value of 61 and has a negative Shapley value.) Is there a way to glean more insight as to why?
Hi @shansen , do you simply mean the row level SHAP value or the hyperparameters? If you talking about the HP, there is no way to retrieve that as data. My recommendation is to suggest that feature in Ideation:
I am talking about the row-level SHAP values, and their correlation with the feature values. I have a situation where I am trying to determine which values of a feature contribute to or detract from a positive outcome. I would expect the average SHAP value for a given numerical feature value to be a little more orderly than this. Have you ever seen numeric SHAP values vary so much along a numeric feature value?
The coordinate SHAP values file gives you exactly that. It gives you the shape for each feature and each row. If you don’t see it, it is because the way your chart is presenting. Use a distribution chart with 2 dimensions where one of them is the feature name.
Igor,
Thanks for your help. When I used a linear model, the SHAP values of numerical features did give me a linear relationship that I could back-out a coefficient from (See attached Feature Values vs SHAP Values.png.). One troubleshooting issue I had to resolve first was to make sure I matched things up starting on row 0 instead of 1 to match with the Auto ML output:
LOAD
...
RowNo() - 1 AS my_row_index // The automl_row_index starts at 0.
FROM [lib://MySpace:DataFiles/outcome_prediction.csv]
Then things made much more sense.