<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>article Qlik AutoML: Overview of SHAP values in Official Support Articles</title>
    <link>https://community.qlik.com/t5/Official-Support-Articles/Qlik-AutoML-Overview-of-SHAP-values/ta-p/1959508</link>
    <description>&lt;P&gt;The goal of this article is to give an overview of SHAP values which are generated from Qlik AutoML model predictions. SHAP values serve as a way to measure variable importance and how much they influence the predicted value of the model.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;SHAP Importance explained&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;SHAP Importance represents how a feature influences the prediction of a single row relative to the other features in that row and to the average outcome in the dataset.&lt;/P&gt;
&lt;P&gt;The goal of SHAP is to explain the prediction of an instance x by computing the contribution of each feature to the prediction. The SHAP explanation method computes Shapley values from coalitional game theory. The feature values of a data instance act as players in a coalition. Shapley values tell us how to fairly distribute the "payout" (the prediction) among the features. A player can be an individual feature value or a group of feature values.&lt;/P&gt;
&lt;P&gt;For more information and mathy fun please reference this chapter from Interpretable Machine Learning:&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;A href="https://christophm.github.io/interpretable-ml-book/shap.html" target="_blank" rel="noopener"&gt;https://christophm.github.io/interpretable-ml-book/shap.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Example&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Medical Cost Personal dataset:&amp;nbsp;&lt;A href="https://www.kaggle.com/datasets/mirichoi0218/insurance" target="_blank" rel="noopener"&gt;https://www.kaggle.com/datasets/mirichoi0218/insurance&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Note: I added an ID column, but not including as a feature&lt;/P&gt;
&lt;P&gt;Features:&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;age, sex, bmi, children (number of), smoker, region&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Target:&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;charges&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="1.png" style="width: 897px;"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/84890i7331DA665F2446DD/image-size/large?v=v2&amp;amp;px=999" role="button" title="1.png" alt="1.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I uploaded this dataset into Qlik Cloud and generated 4 models. Random Forest Regression was the champion model.&lt;/P&gt;
&lt;P&gt;From the UI, we see the SHAP Importance visualization. This shows that smoker, age, and bmi are the top 3 prediction influencers. Meaning their values have the most effect on the predicted charges.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="2.png" style="width: 475px;"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/84891i64E73A32DFD68292/image-size/large?v=v2&amp;amp;px=999" role="button" title="2.png" alt="2.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Understanding how the values are calculated&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I deployed the model and generated predictions from the Qlik Cloud interface.&amp;nbsp; At this point you can open the data as a Qlik Sense app and combine the predicted output table with the original dataset (see&amp;nbsp;&lt;A href="https://community.qlik.com/t5/Knowledge/Qlik-AutoML-How-to-join-predicted-output-to-original-trained/ta-p/1960794" target="_blank" rel="noopener"&gt;Qlik AutoML: How to join predicted output to original trained dataset&lt;/A&gt;).&lt;/P&gt;
&lt;P&gt;This is an example of the original table combined with the SHAP values by record.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;Click the image below to enlarge.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="3.png" style="width: 999px;"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/84892iCBB585C65F42E27F/image-size/large?v=v2&amp;amp;px=999" role="button" title="3.png" alt="3.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Example interpretation of&lt;FONT face="courier new,courier"&gt; record 1001&lt;/FONT&gt;-&amp;gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Smoker_SHAP&lt;/FONT&gt; value is &lt;FONT face="courier new,courier"&gt;19315&lt;/FONT&gt; which represents the following:&lt;/P&gt;
&lt;P&gt;How much does &lt;FONT face="courier new,courier"&gt;Smoker=Yes&lt;/FONT&gt; affect the amount of charges given that the account holder is a &lt;FONT face="courier new,courier"&gt;Female&lt;/FONT&gt;, &lt;FONT face="courier new,courier"&gt;19 years old&lt;/FONT&gt;, has a&lt;FONT face="courier new,courier"&gt; bmi&lt;/FONT&gt; of&lt;FONT face="courier new,courier"&gt; 27.9&lt;/FONT&gt;, has &lt;FONT face="courier new,courier"&gt;no children&lt;/FONT&gt;, and is in the &lt;FONT face="courier new,courier"&gt;Southwest&lt;/FONT&gt; region.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;The &lt;FONT face="courier new,courier"&gt;sum&lt;/FONT&gt; of &lt;FONT face="courier new,courier"&gt;Shapley&lt;/FONT&gt; values for each row is how much that rows prediction differs from average.&lt;/P&gt;
&lt;P&gt;Average Predicted Charges (across all records) = &lt;FONT face="courier new,courier"&gt;13511.5&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Sum of SHAP values = &lt;FONT face="courier new,courier"&gt;3396&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Predicted charges manual SHAP calculation = &lt;FONT face="courier new,courier"&gt;16908&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Where:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;sumSHAPS&lt;/FONT&gt; is a calculated column of the sum of the SHAP values in the record.&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;f(x) = age_SHAP+sex_SHAP+bmi_SHAP+smoker_SHAP+children_SHAP+region_SHAP&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;shaps_avgpredcharges is sumSHAPS+average(predicted_charges)&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;f(x) = sumSHAPS+13511.5&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Charges&lt;/FONT&gt; is from the original dataset&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Charges_predicted&lt;/FONT&gt; is the model predicted value&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Value of generated SHAP values&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;The &lt;FONT face="courier new,courier"&gt;_SHAP&lt;/FONT&gt; values can be used in visualizations and further analysis to understand which features are driving the model predictions.&amp;nbsp; For 1001, smoking increased total charges while non-smokers this led to reduced charges.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Notes&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;I rounded the numeric values in the combined table to nearest whole number for readability in the article.&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;Ex: f(x) = round(bmi_SHAP,1)&lt;/FONT&gt;&lt;EM style="font-family: inherit;"&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Qlik AutoML Random Forest use approximate Shapley values. This is why in our example, shaps_avgpredcharges does not equal charges_predicted but are fairly close.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;Average Predicted Charges ,&lt;FONT face="courier new,courier"&gt; f(x) = average(charges_predicted)&lt;/FONT&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Environment&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;LI-PRODUCT title="Qlik AutoML" id="qlikAutoML"&gt;&lt;/LI-PRODUCT&gt;&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="font-style: italic;"&gt;The information in this article is provided as-is and to be used at own discretion. Depending on tool(s) used, customization(s), and/or other factors ongoing support on the solution below may not be provided by Qlik Support.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 27 Jul 2022 10:34:38 GMT</pubDate>
    <dc:creator>KellyHobson</dc:creator>
    <dc:date>2022-07-27T10:34:38Z</dc:date>
    <item>
      <title>Qlik AutoML: Overview of SHAP values</title>
      <link>https://community.qlik.com/t5/Official-Support-Articles/Qlik-AutoML-Overview-of-SHAP-values/ta-p/1959508</link>
      <description>&lt;P&gt;The goal of this article is to give an overview of SHAP values which are generated from Qlik AutoML model predictions. SHAP values serve as a way to measure variable importance and how much they influence the predicted value of the model.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;SHAP Importance explained&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;SHAP Importance represents how a feature influences the prediction of a single row relative to the other features in that row and to the average outcome in the dataset.&lt;/P&gt;
&lt;P&gt;The goal of SHAP is to explain the prediction of an instance x by computing the contribution of each feature to the prediction. The SHAP explanation method computes Shapley values from coalitional game theory. The feature values of a data instance act as players in a coalition. Shapley values tell us how to fairly distribute the "payout" (the prediction) among the features. A player can be an individual feature value or a group of feature values.&lt;/P&gt;
&lt;P&gt;For more information and mathy fun please reference this chapter from Interpretable Machine Learning:&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;A href="https://christophm.github.io/interpretable-ml-book/shap.html" target="_blank" rel="noopener"&gt;https://christophm.github.io/interpretable-ml-book/shap.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Example&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Medical Cost Personal dataset:&amp;nbsp;&lt;A href="https://www.kaggle.com/datasets/mirichoi0218/insurance" target="_blank" rel="noopener"&gt;https://www.kaggle.com/datasets/mirichoi0218/insurance&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Note: I added an ID column, but not including as a feature&lt;/P&gt;
&lt;P&gt;Features:&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;age, sex, bmi, children (number of), smoker, region&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Target:&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;charges&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="1.png" style="width: 897px;"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/84890i7331DA665F2446DD/image-size/large?v=v2&amp;amp;px=999" role="button" title="1.png" alt="1.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I uploaded this dataset into Qlik Cloud and generated 4 models. Random Forest Regression was the champion model.&lt;/P&gt;
&lt;P&gt;From the UI, we see the SHAP Importance visualization. This shows that smoker, age, and bmi are the top 3 prediction influencers. Meaning their values have the most effect on the predicted charges.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="2.png" style="width: 475px;"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/84891i64E73A32DFD68292/image-size/large?v=v2&amp;amp;px=999" role="button" title="2.png" alt="2.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Understanding how the values are calculated&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I deployed the model and generated predictions from the Qlik Cloud interface.&amp;nbsp; At this point you can open the data as a Qlik Sense app and combine the predicted output table with the original dataset (see&amp;nbsp;&lt;A href="https://community.qlik.com/t5/Knowledge/Qlik-AutoML-How-to-join-predicted-output-to-original-trained/ta-p/1960794" target="_blank" rel="noopener"&gt;Qlik AutoML: How to join predicted output to original trained dataset&lt;/A&gt;).&lt;/P&gt;
&lt;P&gt;This is an example of the original table combined with the SHAP values by record.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;Click the image below to enlarge.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="3.png" style="width: 999px;"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/84892iCBB585C65F42E27F/image-size/large?v=v2&amp;amp;px=999" role="button" title="3.png" alt="3.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Example interpretation of&lt;FONT face="courier new,courier"&gt; record 1001&lt;/FONT&gt;-&amp;gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Smoker_SHAP&lt;/FONT&gt; value is &lt;FONT face="courier new,courier"&gt;19315&lt;/FONT&gt; which represents the following:&lt;/P&gt;
&lt;P&gt;How much does &lt;FONT face="courier new,courier"&gt;Smoker=Yes&lt;/FONT&gt; affect the amount of charges given that the account holder is a &lt;FONT face="courier new,courier"&gt;Female&lt;/FONT&gt;, &lt;FONT face="courier new,courier"&gt;19 years old&lt;/FONT&gt;, has a&lt;FONT face="courier new,courier"&gt; bmi&lt;/FONT&gt; of&lt;FONT face="courier new,courier"&gt; 27.9&lt;/FONT&gt;, has &lt;FONT face="courier new,courier"&gt;no children&lt;/FONT&gt;, and is in the &lt;FONT face="courier new,courier"&gt;Southwest&lt;/FONT&gt; region.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;The &lt;FONT face="courier new,courier"&gt;sum&lt;/FONT&gt; of &lt;FONT face="courier new,courier"&gt;Shapley&lt;/FONT&gt; values for each row is how much that rows prediction differs from average.&lt;/P&gt;
&lt;P&gt;Average Predicted Charges (across all records) = &lt;FONT face="courier new,courier"&gt;13511.5&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Sum of SHAP values = &lt;FONT face="courier new,courier"&gt;3396&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Predicted charges manual SHAP calculation = &lt;FONT face="courier new,courier"&gt;16908&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Where:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;sumSHAPS&lt;/FONT&gt; is a calculated column of the sum of the SHAP values in the record.&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;f(x) = age_SHAP+sex_SHAP+bmi_SHAP+smoker_SHAP+children_SHAP+region_SHAP&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;shaps_avgpredcharges is sumSHAPS+average(predicted_charges)&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;f(x) = sumSHAPS+13511.5&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Charges&lt;/FONT&gt; is from the original dataset&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Charges_predicted&lt;/FONT&gt; is the model predicted value&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Value of generated SHAP values&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;The &lt;FONT face="courier new,courier"&gt;_SHAP&lt;/FONT&gt; values can be used in visualizations and further analysis to understand which features are driving the model predictions.&amp;nbsp; For 1001, smoking increased total charges while non-smokers this led to reduced charges.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Notes&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;I rounded the numeric values in the combined table to nearest whole number for readability in the article.&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;Ex: f(x) = round(bmi_SHAP,1)&lt;/FONT&gt;&lt;EM style="font-family: inherit;"&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Qlik AutoML Random Forest use approximate Shapley values. This is why in our example, shaps_avgpredcharges does not equal charges_predicted but are fairly close.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;Average Predicted Charges ,&lt;FONT face="courier new,courier"&gt; f(x) = average(charges_predicted)&lt;/FONT&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Environment&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;LI-PRODUCT title="Qlik AutoML" id="qlikAutoML"&gt;&lt;/LI-PRODUCT&gt;&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="font-style: italic;"&gt;The information in this article is provided as-is and to be used at own discretion. Depending on tool(s) used, customization(s), and/or other factors ongoing support on the solution below may not be provided by Qlik Support.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2022 10:34:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Official-Support-Articles/Qlik-AutoML-Overview-of-SHAP-values/ta-p/1959508</guid>
      <dc:creator>KellyHobson</dc:creator>
      <dc:date>2022-07-27T10:34:38Z</dc:date>
    </item>
  </channel>
</rss>

