Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

plotting results of linest_m and linest_b - newbie

Hi,

For a set of values x, I am trying to show a line plot of y=mx+b, or as per expressions, y=linest_m(y,x)*x + linest_b(y,x).  I want the line to change interactively as I select particular subsets of x from a table.  I use a line chart, and define an expression as above, but it doesn't seem to want to calculate a slope or intercept "on the fly".  If I predefine the slope and intercept in a load statement, and use those predefined values in the expression (rather than the linest_m and linest_b functions), the line displays fine...but of course with pre-defined constants, as it were, the line of course doesn't change as I modify selected values of x from a table.  Any suggestions?

Backstory: I thought originally that I could just use the trend line (simple linear regression) that is supplied in the charting wizard.  However, what I need to do is a weighted regression (I'm plotting proportions vs x and then a best fit line through those proportions, and thus each point is based on differing number of observations and has a different standard error), and there's no good way that I see to do a weighted regression (rather I must manipulate the data structure and use linest_m and linest_b functions whereby I CAN get the appropriate fitted line).

Any help much appreciated.

13 Replies
Not applicable
Author

Thanks very much for your suggestions.  i think my main conclusion is that there is no "canned" or standard way to do this.  "This" being weighted regression, or more specifically, for a subset of records subtracting an aggregate function of those records from each element of the rows (on the fly) - that is subtracting the mean of the subset (or a function of the mean) from each record.  If not possible, I suspect it is because it involves two passes through the data (one to calculate the mean, another to subtract that calculated value from each record).

But your suggestions may be promising (given my limited knowledge of QlikView.  Specifically for each of the 4 suggestions:

1. I don't have billions of rows, but I will have over 100,000.  And they may average 50 or so "counts" per record, resulting in a whole lot of records if expanded out.  It does sound a bit cumbersome...

2. Rolling my own regression coefficients is exactly what I am trying to do...but note in the equations you present, the mean of x must be subtracted from each record in the chosen subset (and those squared terms then summed).  That is the crux of my question; can I subtract a mean of a subset from each record in the subset on the fly (with a user specifying the subset). 

3. I'm not sure I follow how this differs from solution #1 above (unless #1 was all about pre-loaded tables in a script - in which case the possible combinations or HUGE - and this is about expansions in an expression).  But it sounds promising - would you be willing to spell it out a bit more for a novice?

4. I don't think I understand this option.  My goal is to have the user select what values of x he wants to have a line calculated.  Again, aggregating all possible combinations ahead of time seems impossible; I think the aggregation needs to be done on the fly, based on the selection of x values made by the user.

Again, thank you so much for continuing to entertain my newbie questions!

Not applicable
Author

Hello

all 4 are "on the fly" calculation ideas.

From the end:

4.

I can imagne, that for points:

x,y,n (for example 100 points together, but each with some n, so after rollout for example 500 pints x,y, because mean n=5) i can find 100 pints x,y1, where

linest_m(x,y) = linest_m(x,y1).

For example:

x,y,n

1,1,2

1,2,2

2,3,1

i in fact have 5 points:

x,y

1,1

1,1

1,2

1,2

2,3

and i can imagine that i can calculate 2 points:

for example:

1,

x,y1

1,y11

2,y12

where linest_m(x,y) = linest_m(x,y1).

But looking for regresiion equations - it may be difficult  So, lets leave this idea ....

3.

Having my example points:

x,y,n

1,1,2

1,2,2

2,3,1

i can on the fly transform it into 5 points (using aggr function)

and then calculate linest_m with standard qlikview function from those 5 points.

But this idea is stupid, because we need more resources and time for each calculation than in ideas 1 and 2.So, lets leave this idea ....

2.

I think it is of course possible to write own expressions.

Check this one:

sum(n*(x-$(=sum(x*n)/sum(n)))*(y-$(=sum(y*n)/sum(n))))

/

sum( n*pow(x-$(=sum(x*n)/sum(n)),2))

maybe i'am wrong, but it should calculate

Equation

I'am not sure, but it gives results another than linest_m(x,y1) ....

This all formula you still can execute inside $(=), so expression:

$(=sum(n*(x-$(=sum(x*n)/sum(n)))*(y-$(=sum(y*n)/sum(n))))

/

sum( n*pow(x-$(=sum(x*n)/sum(n)),2)))

should work and should be calculated once for entire dataset (regarding selections).

1.

This idea is easiest to implement.

In load script we have to do only something like this:

tab:

load * inline

[x,y,n

1,0,2

2,3,5

2,3,4

2,100,5];

tab1:

LOAD x,

y

Resident tab

while IterNo()<=n;

drop Table tab;

Assuming that in fact QlikView stores distinct values and binary index for each field it is worth to compare it with idea 2.

regards

Darek

Not applicable
Author

Thanks so much!!  Option # 2 (rolling my own equations) in fact worked; I implemented it on a trial set of data and confirmed the results with my statistical software.  The trick is all in knowing how to use the $ and = appropriately in expressions; something I don't really fully understand yet, but managed to bumble my way through based on your prototype suggestions.  I'll need to do a little background reading on those operators to better understand how to correctly utilize them.

Thanks again - with your extensive help I've managed to implement weighted regression in Qlikview.

Not applicable
Author

i would like to ask izzit possible for us to put our own formula into Qlikview and then to get 1 curve on our scatter chart??