Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
This document is second part or next step of How to Design Simple Linear Regression in Qlik Sense with help of Microsoft R(R Integration with Qli...
We have already dicussed is above document what is Microsoft R, how could be integrated with Qlik Sense. What is basic of regressor or predictive analysis?
This document will guide you how to design Multiple Linear Regression model in Qlik Sense which is calculated on R engine and visualize in Qlik Sense.
Let’s discuss what Multiple Linear Regression is with a simple use case: Venture capitalist challenge
We have a data set where we have fields
Few rows from data:
R&D Spend | Administration | Marketing Spend | State | Profit |
165349.2 | 136897.8 | 471784.1 | New York | 192261.8 |
162597.7 | 151377.6 | 443898.5 | California | 191792.1 |
153441.5 | 101145.6 | 407934.5 | Florida | 191050.4 |
144372.4 | 118671.9 | 383199.6 | New York | 182902 |
142107.3 | 91391.77 | 366168.4 | Florida | 166187.9 |
131876.9 | 99814.71 | 362861.4 | New York | 156991.1 |
Interpretation of data: Each row represents one company. Data set has 50 records, how much the company spend in that year on R&D Spend, Marketing Spend and Administration in which state and what was the profit.
To analyses to create the model, which type of company it is most interested in investment depends on profit.
We have to find out profit (dependent variable) is depend on which independent variable (R&D Spend, Marketing Spend, Administration) mostly.
Venture Capitalist fund isn’t going to invest in all companies, they are looking where company perform better New York or California or Florida. Which company perform better that company spends more on marketing spend or less on marketing spend? Which of the independent variables leads more profit?
We can’t answer this from top of the head, we have to find profit is depend on which independent variable most.
Simple Linear Regression: y =b0 + b1*x1
Multiple Linear Regression: y = b0 + b1*x1 + b2*x2 +…. bn*xn
Note:
Assumptions of a Linear Regression:
Before creating the model, you have to confirm that these assumptions are true. In this document we won’t focus on these assumptions. If you want to create model in real life, please do follow these assumptions.
Back to Multiple linear regression:
To find how exactly independent variables (R&D Spend, Marketing Spend, and Administration) have relation with profit.
R&D Spend | Administration | Marketing Spend | State | Profit |
165349.2 | 136897.8 | 471784.1 | New York | 192261.8 |
162597.7 | 151377.6 | 443898.5 | California | 191792.1 |
153441.5 | 101145.6 | 407934.5 | Florida | 191050.4 |
144372.4 | 118671.9 | 383199.6 | New York | 182902 |
142107.3 | 91391.77 | 366168.4 | Florida | 166187.9 |
131876.9 | 99814.71 | 362861.4 | New York | 156991.1 |
Y=b0 + b1*x1 + b2*x2 + b3*x3 + ??
(x1 is amount spend on Administration)
How to handle state: we transform values from state to digits i.e. 1 or 0. We call it dummy variable.
D2=1-D1
Whenever you create model, always admit one dummy variable.
Statistical Significance: 0.05 or 5% which is from where we start feeling unease that we are assumption is hypothetical. 5% is 1 out of 20, it means something is not true or it is unlikely to happen random.
Statistical Significance means no hypothesis is true.
So multiple linear regression: we need to find out which independent variable has lowest Statistical Significance or P value.
How you write multiple linear regression statement in Microsoft R:
regressor = lm(formula = Profit ~ R.D Spend + Administration + Marketing.Spend,
data = training_set)
Note: training set is nothing but a set of data from total data.
Or you can write the statement as:
regressor = lm(formula = Profit ~ .,
data = training_set)
Result:
Most important values are last two columns:
P value: The lower the P value is, more it has impact on dependent variable.
Statistical significance: *** means most Statistical significance which means P value is 0 to 0.001.
Let’s try to do same analysis in Qlik Sense with R engine:
Create an app, drag and drop Advance analytics extension:
Select Multiple Linear regression analysis:
Select State as dimension, Response measure as profit and R&D. Spend as Predictor variable:
P value is 0.1161
Let’s do the same thing for Administration and Marketing Spend:
Administration as predictor:
Marketing Spend as predector
We can notice that R.D. Spend has least P value t means Profit is dependent on R.D. Spend most.
Next Analysis is k-means clustering k-means clustering
Everything is calculated by R engine and Qlik Sense visualise it. This is the beauty of R integration with Qlik Sense.
Reach out to me at kumar.rohit1609@gmail.com if there is need of any clarification or need assistance
Connect with me on LinkedIn https://in.linkedin.com/pub/rohit-kumar/2b/a15/67b,
To get latest updates and articles, join my Facebook page https://www.facebook.com/QlikIntellectuals
When applicable please mark the appropriate replies as ACCEPT AS SOLUTION and LIKE it. This will help community members and Qlik Employees know which discussions have already been addressed and have a possible known solution. Please mark threads as LIKE if the provided solution is helpful to the problem, but does not necessarily solve the indicated problem. You can mark multiple threads as LIKE if you feel additional info is useful to others.