<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>article Using linear regression to enrich dimensions in Member Articles</title>
    <link>https://community.qlik.com/t5/Member-Articles/Using-linear-regression-to-enrich-dimensions/ta-p/1496097</link>
    <description>&lt;P&gt;&lt;SPAN&gt;Native statistical functions are used very little by Qlik developers as far as I know. Those with data science background stick to standard tools like R and Phyton. Using these along with Qlik was made easier by server-side extensions and previously impossible use cases are now simple and straightforward. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You can learn more about SSE here: &lt;/SPAN&gt;&lt;A style="font-size: 10pt;" title="https://github.com/qlik-oss/server-side-extension" href="https://github.com/qlik-oss/server-side-extension" target="_blank"&gt;https://github.com/qlik-oss/server-side-extension&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Nevertheless, it would be a shame to completely disregard native Qlik functions. With little effort (and without SSE link to external statistical engine) they can help us to enrich loaded data and help to find unexpected insights.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;For example, one use case, that I would like to examine here, is to segment customers with a help of linear regression. The simplest approach is to focus on sales by customers in time.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Using linear regression in a chart expression can give us following information.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt; &lt;IMG style="height: 269px; width: 620px;" class="jive-image image-1" src="https://community.qlik.com/legacyfs/online/210227_lr_1.PNG" border="0" alt="lr_1.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The expression for the regression line is the following:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;IMG style="height: 13px; width: 620px;" class="jive-image image-2" src="https://community.qlik.com/legacyfs/online/210228_lr_2.PNG" border="0" alt="lr_2.PNG" /&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;LINEST_B function calculates the intersect – initial value when the independent variable is zero&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;LINEST_M function calculates the slope of the regression line&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;(field “MonthSeq“ is just a sequential number for months starting from 1)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Having a dynamic expression for any combination of selected dimensions is an awesome way to familiarize oneself with data and underlying linear trends in them.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;When our focus is on data discovery, it’s often better to pre-calculate regression in the script. For example, we could seek only those customers whose general trend is negative and for that we need some handle (field) for selection.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Obviously, first we must have some transactional data prepared. I created a testing script for this exercise, which will create a dataset (it uses random numbers so after each reaload the results will change).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Our goal is to associate a new table with this data, which will store the regression results on defined level. For the sake of simplicity, we will calculate on a customer level only.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: auto;" class="jive-image image-3" src="https://community.qlik.com/legacyfs/online/210230_lr_3.PNG" border="0" alt="lr_3.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Next, we choose two numerical fields which will be used in linear regression algorithm (Qlik’s LR functions support only two fields, for multi-variate regression one must use SSE to a statistical engine).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Since we want to calculate a linear trendline for sales quantity, one value will represent the order position of a particular month of the year and a second value will be a sum of quantity sold in that month. Therefore, we create a resident aggregate table which will sum the quantity by customer and month sequence number.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: auto;" class="image-4 jive-image" src="https://community.qlik.com/legacyfs/online/210231_lr_8.PNG" border="0" alt="lr_8.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The month sequence number is calculated in the script using autonumber function on top of month-year ID.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: auto;" class="image-5 jive-image" src="https://community.qlik.com/legacyfs/online/210232_lr_4.PNG" border="0" alt="lr_4.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Now we are ready to use regression functions. We’ll make calculations in another resident table. The resulting dataset will store regression coefficients, intersect and slope, and I usually add also a standard deviation, which gives a good information about how spread out the underlying values are.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: auto;" class="jive-image image-6" src="https://community.qlik.com/legacyfs/online/210234_lr_5.PNG" border="0" alt="lr_5.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Numbers are nice, but I like to also make some basic categorizations for quicker filtering . Thus, we will add trend orientation (positive or negative), trend force (significant or negligible) and trend classes. This can be done by preceding load on top of the last resident table.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: 77px; width: 620px;" class="jive-image image-7" src="https://community.qlik.com/legacyfs/online/210236_lr_6.PNG" border="0" alt="lr_6.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We used suitable constants for the trend force and class in this example, but I recommend to use quartiles in a real-world scenario.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Resulting data model includes fields that will allow us e.g. to quickly filter customers with significant negative trend.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: 165px; width: 620px;" class="image-8 jive-image" src="https://community.qlik.com/legacyfs/online/210239_lr_9.PNG" border="0" alt="lr_9.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Real-world data allow us to go much further. First, there can be multiple levels of calculation, like country, site, customer, item, operation, service, etc. and even their combinations. Then, regression can be calculated in multiple time frames – 12 months, 12 weeks, 30 days... And, there usually is a great number of different measures, so the regression can be calculated not only against time but also in between them.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The most fun (and insightful) is then to cross-analyze all these regressions against each other using Qlik’s associative experience.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: 191px; width: 620px;" class="image-9 jive-image" src="https://community.qlik.com/legacyfs/online/210240_lr_7.PNG" border="0" alt="lr_7.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Linear regression is not suitable for predictions but I found that it can serve as a very effective navigator for business users who do not have statistical background and have larger detailed datasets.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Check the attached file to see expressions as well as described script. I’m looking forward to hearing any comments and suggestions.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Radovan&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;PS: If you'd like to use power or polynomial trendline, check &lt;A href="https://community.qlik.com/docs/DOC-19059" target="_blank"&gt;Calculating trend lines, values and formulas on charts and tables in Qlik Sense&lt;/A&gt; by &lt;A href="https://community.qlik.com/qlik-users/189257" target="_blank"&gt;richbyard&lt;/A&gt;‌.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 28 Jun 2022 11:33:56 GMT</pubDate>
    <dc:creator>RadovanOresky</dc:creator>
    <dc:date>2022-06-28T11:33:56Z</dc:date>
    <item>
      <title>Using linear regression to enrich dimensions</title>
      <link>https://community.qlik.com/t5/Member-Articles/Using-linear-regression-to-enrich-dimensions/ta-p/1496097</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Native statistical functions are used very little by Qlik developers as far as I know. Those with data science background stick to standard tools like R and Phyton. Using these along with Qlik was made easier by server-side extensions and previously impossible use cases are now simple and straightforward. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You can learn more about SSE here: &lt;/SPAN&gt;&lt;A style="font-size: 10pt;" title="https://github.com/qlik-oss/server-side-extension" href="https://github.com/qlik-oss/server-side-extension" target="_blank"&gt;https://github.com/qlik-oss/server-side-extension&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Nevertheless, it would be a shame to completely disregard native Qlik functions. With little effort (and without SSE link to external statistical engine) they can help us to enrich loaded data and help to find unexpected insights.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;For example, one use case, that I would like to examine here, is to segment customers with a help of linear regression. The simplest approach is to focus on sales by customers in time.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Using linear regression in a chart expression can give us following information.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt; &lt;IMG style="height: 269px; width: 620px;" class="jive-image image-1" src="https://community.qlik.com/legacyfs/online/210227_lr_1.PNG" border="0" alt="lr_1.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The expression for the regression line is the following:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;IMG style="height: 13px; width: 620px;" class="jive-image image-2" src="https://community.qlik.com/legacyfs/online/210228_lr_2.PNG" border="0" alt="lr_2.PNG" /&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;LINEST_B function calculates the intersect – initial value when the independent variable is zero&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;LINEST_M function calculates the slope of the regression line&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;(field “MonthSeq“ is just a sequential number for months starting from 1)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Having a dynamic expression for any combination of selected dimensions is an awesome way to familiarize oneself with data and underlying linear trends in them.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;When our focus is on data discovery, it’s often better to pre-calculate regression in the script. For example, we could seek only those customers whose general trend is negative and for that we need some handle (field) for selection.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Obviously, first we must have some transactional data prepared. I created a testing script for this exercise, which will create a dataset (it uses random numbers so after each reaload the results will change).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Our goal is to associate a new table with this data, which will store the regression results on defined level. For the sake of simplicity, we will calculate on a customer level only.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: auto;" class="jive-image image-3" src="https://community.qlik.com/legacyfs/online/210230_lr_3.PNG" border="0" alt="lr_3.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Next, we choose two numerical fields which will be used in linear regression algorithm (Qlik’s LR functions support only two fields, for multi-variate regression one must use SSE to a statistical engine).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Since we want to calculate a linear trendline for sales quantity, one value will represent the order position of a particular month of the year and a second value will be a sum of quantity sold in that month. Therefore, we create a resident aggregate table which will sum the quantity by customer and month sequence number.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: auto;" class="image-4 jive-image" src="https://community.qlik.com/legacyfs/online/210231_lr_8.PNG" border="0" alt="lr_8.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The month sequence number is calculated in the script using autonumber function on top of month-year ID.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: auto;" class="image-5 jive-image" src="https://community.qlik.com/legacyfs/online/210232_lr_4.PNG" border="0" alt="lr_4.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Now we are ready to use regression functions. We’ll make calculations in another resident table. The resulting dataset will store regression coefficients, intersect and slope, and I usually add also a standard deviation, which gives a good information about how spread out the underlying values are.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: auto;" class="jive-image image-6" src="https://community.qlik.com/legacyfs/online/210234_lr_5.PNG" border="0" alt="lr_5.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Numbers are nice, but I like to also make some basic categorizations for quicker filtering . Thus, we will add trend orientation (positive or negative), trend force (significant or negligible) and trend classes. This can be done by preceding load on top of the last resident table.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: 77px; width: 620px;" class="jive-image image-7" src="https://community.qlik.com/legacyfs/online/210236_lr_6.PNG" border="0" alt="lr_6.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We used suitable constants for the trend force and class in this example, but I recommend to use quartiles in a real-world scenario.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Resulting data model includes fields that will allow us e.g. to quickly filter customers with significant negative trend.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: 165px; width: 620px;" class="image-8 jive-image" src="https://community.qlik.com/legacyfs/online/210239_lr_9.PNG" border="0" alt="lr_9.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Real-world data allow us to go much further. First, there can be multiple levels of calculation, like country, site, customer, item, operation, service, etc. and even their combinations. Then, regression can be calculated in multiple time frames – 12 months, 12 weeks, 30 days... And, there usually is a great number of different measures, so the regression can be calculated not only against time but also in between them.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The most fun (and insightful) is then to cross-analyze all these regressions against each other using Qlik’s associative experience.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;IMG style="height: 191px; width: 620px;" class="image-9 jive-image" src="https://community.qlik.com/legacyfs/online/210240_lr_7.PNG" border="0" alt="lr_7.PNG" /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Linear regression is not suitable for predictions but I found that it can serve as a very effective navigator for business users who do not have statistical background and have larger detailed datasets.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Check the attached file to see expressions as well as described script. I’m looking forward to hearing any comments and suggestions.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Radovan&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;PS: If you'd like to use power or polynomial trendline, check &lt;A href="https://community.qlik.com/docs/DOC-19059" target="_blank"&gt;Calculating trend lines, values and formulas on charts and tables in Qlik Sense&lt;/A&gt; by &lt;A href="https://community.qlik.com/qlik-users/189257" target="_blank"&gt;richbyard&lt;/A&gt;‌.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Jun 2022 11:33:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Member-Articles/Using-linear-regression-to-enrich-dimensions/ta-p/1496097</guid>
      <dc:creator>RadovanOresky</dc:creator>
      <dc:date>2022-06-28T11:33:56Z</dc:date>
    </item>
    <item>
      <title>Re: Using linear regression to enrich dimensions</title>
      <link>https://community.qlik.com/t5/Member-Articles/Using-linear-regression-to-enrich-dimensions/tac-p/1496098#M316</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Awesome post ! Sophisticated analysis, yet quite simple to implement based on this recipe &lt;IMG src="https://community.qlik.com/legacyfs/online/emoticons/happy.png" /&gt; &lt;/P&gt;&lt;P&gt;I used it just once (in UI) and only because a client requested it and actually never thought of using it in script and make the most of it. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 13 Sep 2018 23:00:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Member-Articles/Using-linear-regression-to-enrich-dimensions/tac-p/1496098#M316</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-09-13T23:00:43Z</dc:date>
    </item>
  </channel>
</rss>

