How to Design Multiple Linear Regression in Qlik Sense with help of Microsoft R(R Integration with Qlik Sense)

rohitk1609 · Jul 4, 2021 1:22:46 PM

This document is second part or next step of How to Design Simple Linear Regression in Qlik Sense with help of Microsoft R(R Integration with Qli...

We have already dicussed is above document what is Microsoft R, how could be integrated with Qlik Sense. What is basic of regressor or predictive analysis?

This document will guide you how to design Multiple Linear Regression model in Qlik Sense which is calculated on R engine and visualize in Qlik Sense.

Let’s discuss what Multiple Linear Regression is with a simple use case: Venture capitalist challenge

We have a data set where we have fields

R&D Spend
Administration
Marketing Spend
State
Profit

Few rows from data:

R&D Spend	Administration	Marketing Spend	State	Profit
165349.2	136897.8	471784.1	New York	192261.8
162597.7	151377.6	443898.5	California	191792.1
153441.5	101145.6	407934.5	Florida	191050.4
144372.4	118671.9	383199.6	New York	182902
142107.3	91391.77	366168.4	Florida	166187.9
131876.9	99814.71	362861.4	New York	156991.1

Interpretation of data: Each row represents one company. Data set has 50 records, how much the company spend in that year on R&D Spend, Marketing Spend and Administration in which state and what was the profit.

To analyses to create the model, which type of company it is most interested in investment depends on profit.

We have to find out profit (dependent variable) is depend on which independent variable (R&D Spend, Marketing Spend, Administration) mostly.

Venture Capitalist fund isn’t going to invest in all companies, they are looking where company perform better New York or California or Florida. Which company perform better that company spends more on marketing spend or less on marketing spend? Which of the independent variables leads more profit?

We can’t answer this from top of the head, we have to find profit is depend on which independent variable most.

Simple Linear Regression: y =b0 + b1*x1

Multiple Linear Regression: y = b0 + b1*x1 + b2*x2 +…. bn*xn

Note:

Assumptions of a Linear Regression:

Linearity
Homoscedasticity
Multivariate Normality
Independence of errors
Lack of multicollinearity

Before creating the model, you have to confirm that these assumptions are true. In this document we won’t focus on these assumptions. If you want to create model in real life, please do follow these assumptions.

Back to Multiple linear regression:

To find how exactly independent variables (R&D Spend, Marketing Spend, and Administration) have relation with profit.

R&D Spend	Administration	Marketing Spend	State	Profit
165349.2	136897.8	471784.1	New York	192261.8
162597.7	151377.6	443898.5	California	191792.1
153441.5	101145.6	407934.5	Florida	191050.4
144372.4	118671.9	383199.6	New York	182902
142107.3	91391.77	366168.4	Florida	166187.9
131876.9	99814.71	362861.4	New York	156991.1

Y=b0 + b1*x1 + b2*x2 + b3*x3 + ??

(x1 is amount spend on Administration)

How to handle state: we transform values from state to digits i.e. 1 or 0. We call it dummy variable.

D2=1-D1

Whenever you create model, always admit one dummy variable.

Statistical Significance: 0.05 or 5% which is from where we start feeling unease that we are assumption is hypothetical. 5% is 1 out of 20, it means something is not true or it is unlikely to happen random.

Statistical Significance means no hypothesis is true.

So multiple linear regression: we need to find out which independent variable has lowest Statistical Significance or P value.

How you write multiple linear regression statement in Microsoft R:

regressor = lm(formula = Profit ~ R.D Spend + Administration + Marketing.Spend,

data = training_set)

Note: training set is nothing but a set of data from total data.

Or you can write the statement as:

regressor = lm(formula = Profit ~ .,

data = training_set)

Result:

Most important values are last two columns:

P value: The lower the P value is, more it has impact on dependent variable.

Statistical significance: *** means most Statistical significance which means P value is 0 to 0.001.

Let’s try to do same analysis in Qlik Sense with R engine:

Create an app, drag and drop Advance analytics extension:

Select Multiple Linear regression analysis:

Select State as dimension, Response measure as profit and R&D. Spend as Predictor variable:

P value is 0.1161

Let’s do the same thing for Administration and Marketing Spend:

Administration as predictor:

Marketing Spend as predector

We can notice that R.D. Spend has least P value t means Profit is dependent on R.D. Spend most.

Next Analysis is k-means clustering k-means clustering

Everything is calculated by R engine and Qlik Sense visualise it. This is the beauty of R integration with Qlik Sense.

Rohit's Introduction

Reach out to me at kumar.rohit1609@gmail.com if there is need of any clarification or need assistance

Connect with me on LinkedIn https://in.linkedin.com/pub/rohit-kumar/2b/a15/67b,

To get latest updates and articles, join my Facebook page https://www.facebook.com/QlikIntellectuals

When applicable please mark the appropriate replies as ACCEPT AS SOLUTION and LIKE it. This will help community members and Qlik Employees know which discussions have already been addressed and have a possible known solution. Please mark threads as LIKE if the provided solution is helpful to the problem, but does not necessarily solve the indicated problem. You can mark multiple threads as LIKE if you feel additional info is useful to others.

How to Design Multiple Linear Regression in Qlik Sense with help of Microsoft R(R Integration with Qlik Sense)

How to Design Multiple Linear Regression in Qlik Sense with help of Microsoft R(R Integration with Qlik Sense)

Integrations

Qlik Sense