13 Replies Latest reply: Jul 3, 2014 8:55 PM by Christopher Lee RSS

    plotting results of linest_m and linest_b - newbie

      Hi,

      For a set of values x, I am trying to show a line plot of y=mx+b, or as per expressions, y=linest_m(y,x)*x + linest_b(y,x).  I want the line to change interactively as I select particular subsets of x from a table.  I use a line chart, and define an expression as above, but it doesn't seem to want to calculate a slope or intercept "on the fly".  If I predefine the slope and intercept in a load statement, and use those predefined values in the expression (rather than the linest_m and linest_b functions), the line displays fine...but of course with pre-defined constants, as it were, the line of course doesn't change as I modify selected values of x from a table.  Any suggestions?

       

      Backstory: I thought originally that I could just use the trend line (simple linear regression) that is supplied in the charting wizard.  However, what I need to do is a weighted regression (I'm plotting proportions vs x and then a best fit line through those proportions, and thus each point is based on differing number of observations and has a different standard error), and there's no good way that I see to do a weighted regression (rather I must manipulate the data structure and use linest_m and linest_b functions whereby I CAN get the appropriate fitted line).

       

      Any help much appreciated.

        • Re: plotting results of linest_m and linest_b - newbie
          Josh Good

          Would you be able to post a sample data set and perhaps an visual of what you are trying to achieve?

            • Re: plotting results of linest_m and linest_b - newbie

              Thanks for your reply.

               

              Suppose I have the following x and y values:

               

              x   y

              1   2

              2   4

              3   6

              4   10

               

              A linear regression fit to these points will yield the line y=2.6x - 1.  I'd like to plot that line.  Then if I choose only the first three points (de-selecting the point (4,10)), the regression line will be y=2x.  I'd like the plot then to change to THAT line.  That is, the regression line plotted is modified on the fly as the user selects or de-selects subsets of the x,y pairs in the table.

               

              I can easily get the equation for the line to change "on the fly" (meaning the values returned by the functions linest_m and linest_b  change depending on the subsets of x,y pairs selected), but I can't figure out how to get the line displayed to change accordingly (that is, to depend on the x,y pairs selected and the corresponding linest_b and linest_m values returned).

               

              Thanks for any thoughts.

            • Re: plotting results of linest_m and linest_b - newbie
              Dariusz Mielczarek

              Hello,

               

              you may like something like in my example.

              Please let me know if it helps.

               

              regards

              Darek

                • Re: plotting results of linest_m and linest_b - newbie

                  Thanks, but unfortunately I only have a trial version available until I determine if it will suffice for the purposes that I need (correct me if I'm wrong, but I assume therefore I can't view your .qvw file).

                    • Re: plotting results of linest_m and linest_b - newbie
                      Dariusz Mielczarek

                      Hello,

                       

                      I'am not sure if i clear understand your needs. May you show (for example in xls) how should chart look like after few steps of user selections? If it is always 1 line, but with factors b and m calculated from selected points, you may use expression like this:

                       

                      only({1}$(=LINEST_M(y,x))*x)+$(=LINEST_B(y,x))

                       

                      regards

                      Darek

                        • Re: plotting results of linest_m and linest_b - newbie

                          Thanks for your suggestions.  I'll try to figure out how to apply it...where do I put the formula only({1}$(=LINEST_M(y,x))*x)+$(=LINEST_B(y,x)) for plotting the line?  Does that just go in the usual line graph somewhere?  Sorry to be so ignorant.

                           

                          I've attached an image of an Excel worksheet that shows a dataset, a potential subset of that dataset that could be selected, and how I would like to see a plot change (full data displays one plot, selecting a subset of the data changes that plot to look like the second plot below (where the slope of the line is smaller).

                           

                          Thanks for any further help!

                          2014-05-10_113907.jpg

                            • Re: plotting results of linest_m and linest_b - newbie
                              Dariusz Mielczarek

                              Ok,

                               

                              $(=LINEST_M(y,x)) give you M value for y,x pairs regardsing selection

                              $(=LINEST_B(y,x)) does the same with B value.


                              You should use this:

                              only({1}$(=LINEST_M(y,x))*x)+$(=LINEST_B(y,x))

                              as expression

                              in line chart having x as dimension.


                              regards

                              Darek

                                • Re: plotting results of linest_m and linest_b - newbie

                                  Thanks!  That worked perfectly; I hadn't fully understood the need for $ to evaluate the expression.  I also appreciate now the utility of the "only" and {1} uses which I hadn't realized before.  Thank you, very helpful.

                                   

                                  At the risk of pushing the limit here (and branching into a slightly different topic), as I'd originally mentioned I wanted to do a *weighted* regression, which I don't think is possible in QlikView (please correct me if wrong).  Thus instead of plotting x and y as above, I want to plot a line fit to something like this (I shortened to only 3 possible values for the x axis here):

                                  x     y     count

                                  0     0     15

                                  0     1     20

                                  1     0     28

                                  1     1     12

                                  2     0     43

                                  2     1    18

                                   

                                  The count here represents how many records of x and y are represented (12 records having points (1,1)  for example); thus the counts represent "weights" for each of the 6 records displayed above.  Of course the 6 records above could be expanded out to be 15+20+28+12+43+18 total records, and a line fit to that using the LINEST_M and LINEST_B shown above, but it proves extremely inconvenient. 

                                   

                                  Alternatively I can "roll my own" weighted regression (calculate the weighted equivalent of LINEST_M and LINEST_B results).  That requires (as one example step) that for each of the records above I be able to do something like:

                                   

                                  sum(( x*y - sum(total x*y) )^2). 

                                   

                                  That is, I sum a calculation over 6 records, but each of those 6 calculations summed uses a single calculation obtained from the entire set selected,...the overall mean.  In effect I'm doing sum( (xi - mean(x)) * (xi-mean(x)) ).  The result of that calculation (and others like it) will give me my slope and intercept.

                                   

                                  However, I don't know how/if I can do that calculation in an expression (in my chart).  Basically, can I do a sum across records, where each value summed includes subtracting the overall mean from that individual value (and squaring)?

                                   

                                  Sorry for the long-winded and complicated explanation. But I'm hoping for some suggestions on utilizing expressions in expressions, so to speak!  Thank you.

                                    • Re: plotting results of linest_m and linest_b - newbie
                                      Dariusz Mielczarek

                                      I can see 3 or 4 ways....

                                       

                                      1. Load rows as many times as count value (with while clause). In fact, taking into account QLikView compression it should be good solution if you dont have billion rows.

                                      2. use own formulas for regression coefficients.

                                       

                                       

                                       

                                      3. Use standard QlikView functions (linest_m,b...) - but first

                                       

                                      "multiply number of lines on the fly" using data island and aggr it should be possible, but rather not fast with big data volumes.


                                      4. aggregate data setting a new value of y for each row (or maybe even for each x)



                                      regards

                                      Darek

                                        • Re: plotting results of linest_m and linest_b - newbie

                                          Thanks very much for your suggestions.  i think my main conclusion is that there is no "canned" or standard way to do this.  "This" being weighted regression, or more specifically, for a subset of records subtracting an aggregate function of those records from each element of the rows (on the fly) - that is subtracting the mean of the subset (or a function of the mean) from each record.  If not possible, I suspect it is because it involves two passes through the data (one to calculate the mean, another to subtract that calculated value from each record).

                                           

                                          But your suggestions may be promising (given my limited knowledge of QlikView.  Specifically for each of the 4 suggestions:

                                           

                                          1. I don't have billions of rows, but I will have over 100,000.  And they may average 50 or so "counts" per record, resulting in a whole lot of records if expanded out.  It does sound a bit cumbersome...

                                           

                                          2. Rolling my own regression coefficients is exactly what I am trying to do...but note in the equations you present, the mean of x must be subtracted from each record in the chosen subset (and those squared terms then summed).  That is the crux of my question; can I subtract a mean of a subset from each record in the subset on the fly (with a user specifying the subset). 

                                           

                                          3. I'm not sure I follow how this differs from solution #1 above (unless #1 was all about pre-loaded tables in a script - in which case the possible combinations or HUGE - and this is about expansions in an expression).  But it sounds promising - would you be willing to spell it out a bit more for a novice?

                                           

                                          4. I don't think I understand this option.  My goal is to have the user select what values of x he wants to have a line calculated.  Again, aggregating all possible combinations ahead of time seems impossible; I think the aggregation needs to be done on the fly, based on the selection of x values made by the user.

                                           

                                          Again, thank you so much for continuing to entertain my newbie questions!

                                            • Re: plotting results of linest_m and linest_b - newbie
                                              Dariusz Mielczarek

                                              Hello

                                               

                                              all 4 are "on the fly" calculation ideas.

                                              From the end:

                                              4.

                                              I can imagne, that for points:

                                              x,y,n (for example 100 points together, but each with some n, so after rollout for example 500 pints x,y, because mean n=5) i can find 100 pints x,y1, where

                                              linest_m(x,y) = linest_m(x,y1).

                                               

                                              For example:

                                              x,y,n

                                              1,1,2

                                              1,2,2

                                              2,3,1

                                               

                                              i in fact have 5 points:

                                              x,y

                                              1,1

                                              1,1

                                              1,2

                                              1,2

                                              2,3

                                               

                                              and i can imagine that i can calculate 2 points:

                                              for example:

                                              1,

                                              x,y1

                                              1,y11

                                              2,y12

                                               

                                              where linest_m(x,y) = linest_m(x,y1).

                                              But looking for regresiion equations - it may be difficult  So, lets leave this idea ....

                                               

                                              3.

                                               

                                              Having my example points:

                                              x,y,n

                                              1,1,2

                                              1,2,2

                                              2,3,1

                                               

                                              i can on the fly transform it into 5 points (using aggr function)

                                               

                                              and then calculate linest_m with standard qlikview function from those 5 points.

                                               

                                              But this idea is stupid, because we need more resources and time for each calculation than in ideas 1 and 2.So, lets leave this idea ....

                                               

                                               

                                              2.

                                              I think it is of course possible to write own expressions.

                                               

                                              Check this one:

                                               

                                              sum(n*(x-$(=sum(x*n)/sum(n)))*(y-$(=sum(y*n)/sum(n))))

                                              /

                                              sum( n*pow(x-$(=sum(x*n)/sum(n)),2))

                                               

                                               

                                              maybe i'am wrong, but it should calculate

                                               

                                               

                                              Equation

                                               

                                              I'am not sure, but it gives results another than linest_m(x,y1) ....

                                               

                                              This all formula you still can execute inside $(=), so expression:

                                               

                                              $(=sum(n*(x-$(=sum(x*n)/sum(n)))*(y-$(=sum(y*n)/sum(n))))

                                              /

                                              sum( n*pow(x-$(=sum(x*n)/sum(n)),2)))

                                               

                                              should work and should be calculated once for entire dataset (regarding selections).

                                               

                                               

                                               

                                               

                                              1.

                                              This idea is easiest to implement.

                                              In load script we have to do only something like this:

                                               

                                              tab:

                                              load * inline

                                              [x,y,n

                                              1,0,2

                                              2,3,5

                                              2,3,4

                                              2,100,5];

                                               

                                              tab1:

                                              LOAD x,

                                              y

                                              Resident tab

                                              while IterNo()<=n;

                                               

                                               

                                              drop Table tab;

                                               

                                               

                                              Assuming that in fact QlikView stores distinct values and binary index for each field it is worth to compare it with idea 2.

                                               

                                               

                                              regards

                                              Darek

                                                • Re: plotting results of linest_m and linest_b - newbie

                                                  Thanks so much!!  Option # 2 (rolling my own equations) in fact worked; I implemented it on a trial set of data and confirmed the results with my statistical software.  The trick is all in knowing how to use the $ and = appropriately in expressions; something I don't really fully understand yet, but managed to bumble my way through based on your prototype suggestions.  I'll need to do a little background reading on those operators to better understand how to correctly utilize them.

                                                   

                                                  Thanks again - with your extensive help I've managed to implement weighted regression in Qlikview.

                                • Re: plotting results of linest_m and linest_b - newbie

                                  i would like to ask izzit possible for us to put our own formula into Qlikview and then to get 1 curve on our scatter chart??