32 Replies Latest reply: Nov 16, 2011 6:02 PM by John Witherspoon RSS

    Inside field calculation?

      Hi all,

       

      i have a table where one of the field contain claculation formula based on fields of the table

       

      id, field1, field2, calc
      1 , 1      , 10    , field1/field2
      2 , 2      , 20    , field1*field2
      3 , 3      , 30    , field1+field2
      

       

      Is it possible to resident load this table in script and add new field which is based on calculation of "calc" field?

      I've played with evaluate() function but the result is always null.

       

      Thanks!

      Stefan

        • Re: Inside field calculation?
          Sunil Chauhan

          see the attached file

           

          where i have took the expression in resident table

           

          tab1:
          load * inline [
          id, field1, field2
          1 , 1      , 10   
          2 , 2      , 20   
          3 , 3      , 30   
          4 , 4      , 30   
          ];


          Load *,
          if(rowno()=1,field1/field2,
          if(rowno()=2,field1*field2,
          if(rowno()=3,field1+field2,if(rowno()=4,field1-field2)))) as calc
          resident tab1;

          drop table tab1;

            • Inside field calculation?

              Hi Sunil,

               

              thanks for your answer but i'm looking for more general solution (if is possible) because the table in the first post is a sample table and the real data may contain many and different calculations and many fields included.

               

              Thanks!

              Stefan

                • Inside field calculation?
                  John Duffy

                  Hi Stefan.

                   

                  How do you determine which fields are to be used in the calculation and what the calculations is?

                   

                  In your sample in the first post, what determines that field 1 is divided by field 2 as opposed to multiplied by field 2 or added to field 2 on the first record?

                   

                  John.

                    • Re: Inside field calculation?

                      Hi John,

                       

                      i don't determine nothinng. The "calc" field is a database field and the formula is stored in it.

                       

                      Regards!

                      Stefan

                        • Re: Inside field calculation?
                          Stefan Wühl

                          Stefan,

                           

                          that's an interesting question and I was attempted to say, yes you can do this using evaluate() function.

                          I remembered I played around with something similar few weeks ago (http://community.qlik.com/message/135222) and I just wanted to adapt that to your requirements. I couldn't make it to work again. Then I noticed I was now using QV10SR3 and I remembered using QV9 from time to time - and surprisingly, my linked demo script just works fine in QV9SR6.

                           

                          Then I continued working on QV9, trying to integrate your requirements. To my surprise, I couldn't get some assumingly easy expressions like 'field1' to work either, it always return NULL, as you mentioned.

                           

                          I could access the data for already read in rows by using the peek() method, but this is not what you required. Hm, why this? Is the expression evaluated in a different context where the actual current input row fields are not accessible, like in a global context?

                           

                          I tried to investigate on this, but couldn't find any information in the help or the forum. Almost all examples are using quite easy expressions where they don't use current input row field values, most of the time examples like '6*8' or something like this.

                           

                          I then noticed that the problem might be similar to http://community.qlik.com/thread/15247 and the statement Hector made. Not sure about this, I believe it has to do with QV's internal structure (latter issue probably with how and when expressions are (pre-)parsed and evaluated, which I believe will probably not be done on record base).

                           

                          Then I played around with other functions, trying to evaluate() them on script load, something like:

                           

                          Table2:

                          LOAD id, field1, field2, calc,

                          if(IsNull(evaluate(calc))=-1,'no',evaluate(calc)) as Result;

                          LOAD * INLINE [

                            id, field1, field2, calc

                          1 , 1      , 10    , field1/field2

                          2 , 2      , 20    , field1*field2

                          3 , 3      , 30    , field1+field2

                          4 , 4      , 40    , date(today())

                          5 , 5    , 50   , "peek('field1',-1)*peek('field2',-1)"

                          7 , 7   ,60       , rowno()

                          8 , 8    , 70   , rowno()*field1

                          ];

                           

                          and I again noticed differences between QV 9 and 10, not only with the peek() function evaluation, but also with the rowno().

                          Finally, I was just about giving up, I remembered the "easy ones" using something like

                          evaluate('8*6')

                          so what about replacing the field values in the string to evaluate before evaluating?

                          And in fact, I got this to work for both QV9 and QV10:

                           

                          Table2:

                          LOAD id, field1, field2, calc,calc2,

                          if(IsNull(evaluate(calc2))=-1,'no',evaluate(calc2)) as Result;

                          LOAD id, field1, field2, calc,

                          Replace(Replace(calc,'field1',text(field1)),'field2',text(field2)) as calc2;

                            LOAD * INLINE [

                            id, field1, field2, calc

                          1 , 1      , 10    , field1/field2

                          2 , 2      , 20    , field1*field2

                          3 , 3      , 30    , field1+field2

                          ];

                           

                          Not sure if this meets your requirement, but I think I am at my wits' end.

                           

                          Best regards,

                          another Stefan

                  • Re: Inside field calculation?
                    Miguel Angel Baeyens de Arce

                    Hi Stefan,

                     

                    What about using a variable with fields as parameters? Is that acceptable given your data model and possible values?

                     

                    SET fxEval = Evaluate($1 & KeepChar($2, '+-/*') & $3);
                    
                    LOAD *, $(fxEval(field1, calc, field2)) AS calc2 INLINE [
                    id, field1, field2, calc
                    1 , 1      , 10    , field1/field2
                    2 , 2      , 20    , field1*field2
                    3 , 3      , 30    , field1+field2
                    ];
                    
                    

                     

                    Hope that helps.

                     

                    Miguel Angel Baeyens

                    BI Consultant

                    Comex Grupo Ibérica

                      • Re: Inside field calculation?

                        Thanks Stefan i've ednded with the same situation

                         

                        Thanks Miguel using your solution actually show some results! But is there any way to adapt the variable to use randon number of fileds?

                         

                        Thanks!

                        Stefan

                          • Inside field calculation?
                            Stefan Wühl

                            Stefan,

                             

                            the replace method will also work with more than two fields and if not every field is used in the calc string (the replace just won't have any effect for that field). So just extend the expression with more replace() functions up to maximum number of fields you need (well, I hope you are not using too many, and honestly I don't know if restrictions apply for too many functions embedded into other functions, just give it a try.

                             

                            I think replacing the field values is probably more flexible with regard to calc function structure, for example you will have not problems with such 'sqr(field1)' or 'field2 / field1' or 'field1 / (field2 + field1)'.

                             

                            Hope his helps,

                            Stefan

                        • Inside field calculation?
                          Ralf Becher

                          Hi all,

                           

                          why not store this in an Excel file and load the calc field?

                           

                          ;-)

                           

                          - Ralf

                            • Re: Inside field calculation?

                              I've started with replacing method Stefan. Just to make it work but as far as i know my bosses they will offer the same qvw file to other of our clients and there the filed names are not totally the same. Thats why i'm looking for some solution that will save me some time in future

                               

                              Ralf, is there any diff with loading from excel?!

                               

                              Thanks!

                              Stefan

                              • Inside field calculation?
                                Stefan Wühl

                                Ralf,

                                 

                                I haven't known that something plain like 'field1+field2' will be a correct excel expression indicating the sum of columns titled field1 and field2 row based. But I surely am no Excel expert.

                                Could you elaborate on that?

                                 

                                Stefan, where does your data come from? Could your data source somehow handle the calculation itself? I believe some probably could do.

                                 

                                Regards,

                                Stefan

                                  • Re: Inside field calculation?
                                    Ralf Becher

                                    At this stage I clearly have to ask what is the use case? As I've understand so far it is the abstraction from a client´s datamodel. The calculated fields is only one approach as a workaround, right?

                                      • Inside field calculation?
                                        Sunil Chauhan

                                        tab1:

                                        Load * inline [

                                        id, field1, field2, calc
                                        1 , 1      , 10    , field1/field2
                                        2 , 2      , 20    , field1*field2
                                        3 , 3      , 30    , field1+field2

                                        ];

                                         

                                        tab2:

                                        Load * ,

                                        if(Rowno()=1,field1/field2,if(Rowno()=2,field1*field2,if(Rowno()=1,field1+field2))) as Calc

                                        residdent Tab1;

                                         

                                        drop table Tab1;

                                         

                                         

                                        or

                                         

                                         

                                        tab1:

                                        Load * inline [

                                        id, field1, field2, calc
                                        1 , 1      , 10    , field1/field2
                                        2 , 2      , 20    , field1*field2
                                        3 , 3      , 30    , field1+field2

                                        ];

                                         

                                        and in expression

                                        if(Rowno()=1,field1/field2,if(Rowno()=2,field1*field2,if(Rowno()=1,field1+field2)))

                                        • Re: Inside field calculation?

                                          Ralf,

                                           

                                          the idea is that all the fields are loaded from the db. including the calculation field. All folrmulas are stored in this field. And for particular rows this formula must override the regular calculation. And the formula and fields inside it may vary and they are user defined and may be changed by the end user of the application any time.

                                          Some developer gave the user option to enter this formulas and now the customer want to have this in qlikview and some charts to be made.

                                           

                                          Stefan

                                    • Inside field calculation?
                                      Michael Solomovich

                                      Stefan,

                                       

                                      This looks like working.  Functionality is correct, but it could be cleaner, I think.  Anyway, you can start from here and improve:

                                      data:
                                      LOAD *, rowno() as Row INLINE [
                                      f1, f2, calc
                                      1, 5, "f1 + f2"
                                      4, 10, "f1 * f2"
                                      15, 5, "f1 / f2"];

                                      Let R=peek('Row');

                                      for r=1 to $(R)

                                      LET E=peek('calc', $(r)-1,'data');

                                      tab:
                                      LOAD
                                        f1,
                                        f2,
                                        calc,
                                        evaluate($(E)) as Result
                                      RESIDENT data
                                      WHERE Row=$(r);

                                      Next r

                                      DROP TABLE data;

                                        • Inside field calculation?
                                          Stefan Wühl

                                          Very well done Michael!

                                           

                                          Could you explain why passing the string to evaluate via a variable makes a difference here? Or is this something we have to accept as invented by QlikTech engineers?

                                           

                                          Regards

                                          the other Stefan

                                            • Re: Inside field calculation?
                                              John Witherspoon

                                              Here's another working solution for arbitrary calculations.  I suspect that the performance would be higher than the looping solution, though I've done no testing on large data sets to prove it.  Also, it's dangerous to allow something like this with no validation of the calculations, as any data entry error by the user can crash out the script.  Validating calculations is a huge and I'd say separate topic.  Hopefully that's being done on the data entry side, which is where I think it belongs.  Calculations must of course follow QlikView's syntax and be possible to evaluate at script execution time, so in that sense, they aren't entirely arbitrary.

                                               

                                              Data:
                                              LOAD *
                                              ,recno() as ID // you can just use the key if your record has a key
                                              INLINE [
                                              F1, F2, Calc
                                              1,  5, F1 + F2
                                              4, 10, F1 * F2
                                              15, 5, F1 / F2
                                              ];

                                              Calcs:
                                              LOAD
                                              concat(chr(39) & Calc & chr(39),',',CalcSequence) as CalcNames
                                              ,concat(         Calc          ,',',CalcSequence) as Calcs
                                              ;
                                              LOAD 
                                              recno() as CalcSequence
                                              ,text(fieldvalue('Calc',recno())) as Calc
                                              AUTOGENERATE fieldvaluecount('Calc')
                                              ;
                                              LET vCalcs = 'pick(match(Calc,' & peek('CalcNames') & '),' & peek('Calcs') & ')';

                                              DROP TABLE Calcs;

                                              LEFT JOIN (Data)
                                              LOAD
                                              ID
                                              ,$(vCalcs) as Result
                                              RESIDENT Data
                                              ;

                                               

                                              EDIT: Swuehl, you could probably do something similar to pull off your solution with an arbitrary list of fields.  I suspect you could generate your list of replaces by detecting the field names using $Table and $Field.  I don't know if that would be simpler or more complex than the generated pick(match()).  One advantage is that an arbitrary list of fields may well have fewer members than an arbitrary list of calculations, and thus give you a shorter expression.  That might improve performance, and help avoid any expression length bugs if any were to surface.

                                                • Inside field calculation?
                                                  Stefan Wühl

                                                  Hi John,

                                                   

                                                  that's another nice solution without any evaluate() call, I need to start remembering the pick/match method using variables.

                                                  Since you are building a variable based on calc field values from all records, do you know of any limitiations regarding max variable length (or expression length of the expanded pick function)?

                                                   

                                                  Stefan hasn't mentioned the common calc field expression length or expected number of records, so maybe that's not a problem anyway.

                                                   

                                                  Regards,

                                                  swuehl

                                                    • Inside field calculation?
                                                      John Witherspoon

                                                      I'm not aware of any intended or published maximum variable or expression length.  That said, I've crashed out earlier versions of QlikView with huge expressions while specifically looking for bugs related to expression size.  I haven't tried to test for variable size bugs, but they could exist as well.

                                                       

                                                      So it's one of those things that SHOULD work, and probably won't be a problem in practice.  BUT it might reasonably be expected to fail in some versions due to QlikView bugs if the variable/expression gets too big, whatever too big might be.

                                                    • Re: Inside field calculation?
                                                      John Witherspoon

                                                      John Witherspoon wrote:

                                                       

                                                      Swuehl, you could probably do something similar to pull off your solution with an arbitrary list of fields.  I suspect you could generate your list of replaces by detecting the field names using $Table and $Field.  I don't know if that would be simpler or more complex than the generated pick(match()).  One advantage is that an arbitrary list of fields may well have fewer members than an arbitrary list of calculations, and thus give you a shorter expression.  That might improve performance, and help avoid any expression length bugs if any were to surface.


                                                      I decided to follow up on this.  With a little modification, it's simpler than my pick(match()) solution and works just fine.  I again expect performance to be high.  You could skip ID and Calc in the replace for slightly better performance at the cost of complexity.  And as mentioned, I think this variable expression is less likely to hit any bugs related to size.  See attached.  Script below.

                                                       

                                                      Data:
                                                      LOAD *
                                                      ,recno() as ID // you can just use the key if your record has a key
                                                      INLINE [
                                                      F1, F2, Calc
                                                      1,  5, F1 + F2
                                                      4, 10, F1 * F2
                                                      15, 5, F1 / F2
                                                      ];

                                                      LET vCalcs = repeat('replace(',nooffields('Data')) & 'Calc';

                                                      FOR I = 1 TO nooffields('Data')
                                                          LET vCalcs = vCalcs & ',' & chr(39) & fieldname(I,'Data') & chr(39) & ',' & fieldname(I,'Data') & ')';
                                                      NEXT

                                                      LEFT JOIN (Data)
                                                      LOAD
                                                      ID
                                                      ,evaluate($(vCalcs)) as Result
                                                      RESIDENT Data
                                                      ;

                                                       

                                                      EDIT:  If you know your field list ahead of time, you can of course just put the replace in the initial load to save additional time.  That should be the normal case, I'd think, as I try to never load fields I don't know the name of.  Generating the expression like the above may still make sense from the standpoint of simplifying maintenance, though, as you then only have to list the fields in one place instead of two.  Of course, if people maintaining it later can't even understand what the loop and second load are doing, that might be an argument for just listing all the fields explicitly, but I don't personally tend to cater to people who can't read code, since the people maintaining this should be professional programmers.  I might not expect anyone to come up with this from scratch, but I'd expect them to be able to understand it, even if it took referencing QlikView documentation.

                                                    • Inside field calculation?
                                                      Michael Solomovich

                                                      Stefan,

                                                      Apparently this is how it works.

                                                      BTW, in my version evaluate() is not needed.  Simply $(E) as Result is enough.

                                                      Same true about evaluate in the later versions, I assume.

                                                        • Inside field calculation?
                                                          John Witherspoon

                                                          Alas, evaluate() is still necessary in the replace() example I posted above (testTableDrivenCalculations8).

                                                          • Inside field calculation?
                                                            Stefan Wühl

                                                            @Michael:

                                                             

                                                            Ah, even more simple, I like your solution!

                                                             

                                                            @Ralf:

                                                            No, I don't think it's a bug with FieldValue/FieldIndex, but not sure and most people will probably tend to tell it a bug (after spending hours debugging their apps like me ).

                                                            The manual uses some subtle differences in wording between FieldValue ("Returns the field value found in position n of the field fieldname") and e.g. Peek ("

                                                            Returns the contents of the fieldname in the record specified by row in the input table tablename

                                                            "), but I think noone can understand that.

                                                            A clear word could really help sometimes (and an example).

                                                              • Re: Inside field calculation?
                                                                John Witherspoon

                                                                How about some performance analysis?

                                                                 

                                                                Michael's solution does a resident load for every single row of the table.  A million rows means a million loads X a million rows.  That would be impossibly slow, and it would create a HUGE log file.  It took 48 seconds to load just 5000 rows on my machine:

                                                                 

                                                                Data:
                                                                LOAD
                                                                recno() as ID
                                                                ,recno() as Row // separate because an ID wouldn't normally match the row number
                                                                ,ceil(rand()*10) as F1
                                                                ,ceil(rand()*10) as F2
                                                                ,'F1 ' & pick(ceil(rand()*4),'+','-','*','/') & ' F2' as Calc
                                                                AUTOGENERATE 5000
                                                                ;

                                                                LET R=peek('Row');

                                                                FOR r=1 TO $(R)
                                                                    LET E=peek('Calc',$(r)-1,'Data');
                                                                    Tab:
                                                                    LOAD ID, F1, F2, Calc, $(E) as Result
                                                                    RESIDENT Data
                                                                    WHERE Row=$(r);
                                                                NEXT r

                                                                DROP TABLE Data;

                                                                 

                                                                I can't test Ralf's solution because I'm not going to build a database.  Again, for a million rows, there are a million loads X a million rows.  HOWEVER, those loads are just a quick primary key read on a database.  It SHOULD perform reasonably, but in my opinion puts an unnecessary load on the database management system, plus again would create a huge log file.

                                                                 

                                                                I'm betting I can crash my pick(match()) solution by adding a + recno() to the load and going to enough rows.  If it doesn't crash, it'll at least get slow based on the gigantic pick(match()) I'll be generating.  OK, it was still working at 100,000 rows, each with a unique calculation, but it took 142 seconds to load.  There MAY be a point where it crashes, but it becomes impractically slow before it gets to that point.  It might be a workable solution for some data sets (data sets with a fairly limited number of calculations), but it's not what I'd choose as a generic solution.

                                                                 

                                                                Data:
                                                                LOAD
                                                                recno() as ID
                                                                ,ceil(rand()*10) as F1
                                                                ,ceil(rand()*10) as F2
                                                                ,'F1 ' & pick(ceil(rand()*4),'+','-','*','/') & ' F2 + ' & recno() as Calc
                                                                AUTOGENERATE 100000
                                                                ;

                                                                Calcs:
                                                                LOAD
                                                                concat(chr(39) & Calc & chr(39),',',CalcSequence) as CalcNames
                                                                ,concat(          Calc          ,',',CalcSequence) as Calcs
                                                                ;
                                                                LOAD 
                                                                recno() as CalcSequence
                                                                ,text(fieldvalue('Calc',recno())) as Calc
                                                                AUTOGENERATE fieldvaluecount('Calc')
                                                                ;
                                                                LET vCalcs = 'pick(match(Calc,' & peek('CalcNames') & '),' & peek('Calcs') & ')';

                                                                DROP TABLE Calcs;

                                                                LEFT JOIN (Data)
                                                                LOAD
                                                                ID
                                                                ,$(vCalcs) as Result
                                                                RESIDENT Data
                                                                ;

                                                                 

                                                                And finally we have Stefan's solution, modified by me to generate the required replace() statement automatically.  Even with each calculation being unique, it loads a million rows in 13 seconds, two million rows in 25 seconds, and three million rows in 37 seconds.  It's fast and looks to be o(n), as I'd expect.  Of what we've seen so far, I think it's the most practical solution for real world data sets.

                                                                 

                                                                Data:
                                                                LOAD
                                                                recno() as ID
                                                                ,ceil(rand()*10) as F1
                                                                ,ceil(rand()*10) as F2
                                                                ,'F1 ' & pick(ceil(rand()*4),'+','-','*','/') & ' F2 + ' & recno() as Calc
                                                                AUTOGENERATE 3000000
                                                                ;

                                                                LET vCalcs = repeat('replace(',nooffields('Data')) & 'Calc';

                                                                FOR I = 1 TO nooffields('Data')
                                                                    LET vCalcs = vCalcs & ',' & chr(39) & fieldname(I,'Data') & chr(39) & ',' & fieldname(I,'Data') & ')';
                                                                NEXT

                                                                LEFT JOIN (Data)
                                                                LOAD
                                                                ID
                                                                ,evaluate($(vCalcs)) as Result
                                                                RESIDENT Data
                                                                ;

                                                        • Re: Inside field calculation?
                                                          Ralf Becher

                                                          Hi Stefan,

                                                           

                                                          maybe this is a more suitable solution:

                                                           

                                                          Calculations:

                                                          select ID, CALC from xyz_test;

                                                           

                                                          LET rowText = NoOfRows('Calculations'); // get the total number of rows in Calculations table

                                                           

                                                          for i=1 to $(rowText) // loop through every Calculations row

                                                               let vCalc = FieldValue('CALC',$(i));

                                                                 let vId = FieldValue('ID',$(i));

                                                           

                                                               if len(trim('$(vCalc)')) > 0 then

                                                                   Results:

                                                                   select ID, FIELD1, FIELD2, $(vCalc) as RESULT from xyz_test where id = $(vId);

                                                              end if

                                                          next

                                                           

                                                          - Ralf

                                                           

                                                          Some more explanation: This pushes the calculation into the database where it was edited. User could have use database operators or functions which could maybe unknown in QlikView. This also gives the possibility to use sub-selects or UDFs at the database level..

                                                            • Inside field calculation?
                                                              Stefan Wühl

                                                              This is a really nice evolving thread!

                                                               

                                                              @Ralf: Some time ago I spent some hard hours debugging some strange behaviour. I traced it down to my usage of FieldValue, which - in a case like above - will not work as expected if your field values are non unique (i.e. multiple rows with same CALC expression). FieldValue is indexing the compressed column with unique values. You will notice the difference if you add one more row with repeated CALC expression.

                                                              So it might be better to use peek('CALC',$(i),'Calculations') with i running from 0 to NoOfRows-1.

                                                               

                                                              A lot of solutions to one small problem!

                                                               

                                                              Regards,

                                                              Stefan

                                                                • Re: Inside field calculation?
                                                                  Ralf Becher

                                                                  Good point Stefan (@swuhl), I didn't knew this! Is this a bug with FieldValue?

                                                                   

                                                                  I made some adjustments to have a workaround:

                                                                   

                                                                  Calculations:

                                                                  select ID, CALC from xyz_test;

                                                                   

                                                                  LET rowText = NoOfRows('Calculations'); // get the total number of rows in Calculations table

                                                                   

                                                                  for i=1 to $(rowText) // loop through every Calculations row

                                                                       let vId = FieldValue('ID',$(i));

                                                                       let vCalc = peek('CALC',$(i) -1,'Calculations');

                                                                   

                                                                       if len(trim('$(vCalc)')) > 0 then

                                                                           Results:

                                                                           select ID, FIELD1, FIELD2, $(vCalc) as RESULT from xyz_test where id = $(vId);

                                                                      end if

                                                                  next

                                                                   

                                                                  - Ralf

                                                                   

                                                                  Message was edited by: Ralf Becher 

                                                                   

                                                                  ..better to use peek here