Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE

How to Design Multiple Linear Regression in Qlik Sense with help of Microsoft R(R Integration with Qlik Sense)

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
rohitk1609
Master
Master

How to Design Multiple Linear Regression in Qlik Sense with help of Microsoft R(R Integration with Qlik Sense)

Last Update:

Jul 4, 2021 1:22:46 PM

Updated By:

rohitk1609

Created date:

Jul 3, 2021 2:30:32 PM

Attachments

This document is second part or next step of How to Design Simple Linear Regression in Qlik Sense with help of Microsoft R(R Integration with Qli...

We have already dicussed is above document what is Microsoft R, how could be integrated with Qlik Sense. What is basic of regressor or predictive analysis?

This document will guide you how to design Multiple Linear Regression model in Qlik Sense which is calculated on R engine and visualize in Qlik Sense.

Let’s discuss what Multiple Linear Regression is with a simple use case: Venture capitalist challenge

We have a data set where we have fields

  1. R&D Spend
  2. Administration
  3. Marketing Spend
  4. State
  5. Profit

Few rows from data:

R&D Spend

Administration

Marketing Spend

State

Profit

165349.2

136897.8

471784.1

New York

192261.8

162597.7

151377.6

443898.5

California

191792.1

153441.5

101145.6

407934.5

Florida

191050.4

144372.4

118671.9

383199.6

New York

182902

142107.3

91391.77

366168.4

Florida

166187.9

131876.9

99814.71

362861.4

New York

156991.1

 

Interpretation of data: Each row represents one company. Data set has 50 records, how much the company spend in that year on R&D Spend, Marketing Spend and Administration in which state and what was the profit.

To analyses to create the model, which type of company it is most interested in investment depends on profit.

We have to find out profit (dependent variable) is depend on which independent variable (R&D Spend, Marketing Spend, Administration) mostly.

Venture Capitalist fund isn’t going to invest in all companies, they are looking where company perform better New York or California or Florida. Which company perform better that company spends more on marketing spend or less on marketing spend? Which of the independent variables leads more profit?

We can’t answer this from top of the head, we have to find profit is depend on which independent variable most.

Simple Linear Regression: y =b0 + b1*x1

Multiple Linear Regression: y = b0 + b1*x1 + b2*x2 +…. bn*xn

 

Note:

Assumptions of a Linear Regression:

  1. Linearity
  2. Homoscedasticity
  3. Multivariate Normality
  4. Independence of errors
  5. Lack of multicollinearity

Before creating the model, you have to confirm that these assumptions are true. In this document we won’t focus on these assumptions. If you want to create model in real life, please do follow these assumptions.

 

Back to Multiple linear regression:

To find how exactly independent variables (R&D Spend, Marketing Spend, and Administration) have relation with profit.

 

 

 

 

 

 

 

R&D Spend

Administration

Marketing Spend

State

Profit

165349.2

136897.8

471784.1

New York

192261.8

162597.7

151377.6

443898.5

California

191792.1

153441.5

101145.6

407934.5

Florida

191050.4

144372.4

118671.9

383199.6

New York

182902

142107.3

91391.77

366168.4

Florida

166187.9

131876.9

99814.71

362861.4

New York

156991.1

 

Y=b0              +   b1*x1           +        b2*x2       +     b3*x3         +    ??

                              (x1 is amount spend on Administration)

How to handle state: we transform values from state to digits i.e. 1 or 0. We call it dummy variable.

D2=1-D1

Whenever you create model, always admit one dummy variable.

 

Statistical Significance: 0.05 or 5% which is from where we start feeling unease that we are assumption is hypothetical. 5% is 1 out of 20, it means something is not true or it is unlikely to happen random.

Statistical Significance means no hypothesis is true.

So multiple linear regression: we need to find out which independent variable has lowest Statistical Significance or P value.

How you write multiple linear regression statement in Microsoft R:

regressor = lm(formula = Profit ~ R.D Spend + Administration + Marketing.Spend,

data = training_set)

 

Note: training set is nothing but a set of data from total data.

 

Or you can write the statement as:

regressor = lm(formula = Profit ~ .,

               data = training_set)

 

Result:

rohitk1609_0-1625336853822.png

 

 

Most important values are last two columns:

P value: The lower the P value is, more it has impact on dependent variable.

Statistical significance: *** means most Statistical significance which means P value is 0 to 0.001.

rohitk1609_1-1625336854864.png

 

 

Let’s try to do same analysis in Qlik Sense with R engine:

Create an app, drag and drop Advance analytics extension:

rohitk1609_2-1625336854936.png

 

 

 

Select Multiple Linear regression analysis:

rohitk1609_3-1625336855055.png

 

 

Select State as dimension, Response measure as profit and R&D. Spend as Predictor variable:

rohitk1609_4-1625336855085.png

 

P value is 0.1161

 

Let’s do the same thing for Administration and Marketing Spend:

Administration as predictor:

rohitk1609_5-1625336855134.png

 

 

 

Marketing Spend as predector

rohitk1609_6-1625336855195.png

 

 

We can notice that R.D. Spend has least P value t means Profit is dependent on R.D. Spend most.

 Next Analysis is k-means clustering k-means clustering  

Everything is calculated by R engine and Qlik Sense visualise it. This is the beauty of R integration with Qlik Sense.

Rohit's Introduction  

Reach out to me at kumar.rohit1609@gmail.com if there is need of any clarification or need assistance 

Connect with me on LinkedIn  https://in.linkedin.com/pub/rohit-kumar/2b/a15/67b 

To get latest updates and articles, join my Facebook page  https://www.facebook.com/QlikIntellectuals

When applicable please mark the appropriate replies as ACCEPT AS SOLUTION and LIKE it. This will help community members and Qlik Employees know which discussions have already been addressed and have a possible known solution. Please mark threads as LIKE if the provided solution is helpful to the problem, but does not necessarily solve the indicated problem. You can mark multiple threads as LIKE if you feel additional info is useful to others.

Labels (2)
Contributors
Version history
Last update:
‎2021-07-04 01:22 PM
Updated by: