5 Replies Latest reply: Apr 22, 2013 7:00 PM by Angus Monro RSS

    Apply script logic based on metadata

      Hi,

       

      I have a transaction table with two fields for each 'field', original value (O_xxx) and new value (N_xxxx)

       

      O_Field1N_Field1O_Field2N_Field2

       

      Each field can be of type, economic or non-economic

       

      I have another table which tell me which it is:

      Field pairs always have the same Value

       

      Field NameValue
      O_Field1Economic
      N_Field1Economic
      O_Field2Non-Economic

       

      I need to apply some logic in the script which only occurs IF the field is non-economic.

       

      If the field 'pair' are non-economic AND the original value is diffrent to new value

      Then flag the transaction.

       

      You can assign a tag to a field, but it cannot be used in any further logic.

       

      Any suggestions on how to tackle this would be appreciated.

       

      Thanks,

       

      Tom.

        • Re: Apply script logic based on metadata
          sujeet singh

          One thing making me confused that what is this original field Please explain names with clearity.

          • Re: Apply script logic based on metadata
            Angus Monro

            Hi Tom,

             

            I can think of a couple of solutions to this: a CROSSTABLE-based approach and a dynamic coding approach.  Both are a bit messy.  In this post, I'll describe the CROSSTABLE-based approach. 

             

            Let me assume that

            • your transactions table is called Transactions
            • it has a unique key field called Id
            • your second table is called FieldTypes
            • your objective is to append another column to each transaction record, using the desired IF within the context of a LOAD statement to produce its content.

             

            The idea, then, is as follows:

            1. Do a CROSSTABLE LOAD on Transactions, resulting in table DepivotedTransactions containing fields Id, Field Name and Field Value.  Awkwardly, this will now have separate records for, e.g., O_Field1 and N_Field1, so that they can't be directly compared. 
            2. We need to aggregate each N_ and O_ pairs back together somehow so that they can be compared.  this will need to be grouped by the Id and the base field type.  So, now we add a column that contains that base field type. i.e.
              JOIN LOAD
                  Id,
                  "Field Name",
                  subfield("Field Name",'_',2) AS BaseFieldName
              Resident DepivotedTransactions;
            3. Now we can do the aggregation:
              SemipivotedTransactions:
              LOAD
                  Id,
                  BaseFieldName,
                  MinString("Field Value"&' ') AS Value1, // I'm assuming the fields contain both numeric & string data, so am forcing all of them to be string.
                  MaxString("Field Value"&' ') AS Value2
              Resident DepivotedTransactions
              Group By Id,BaseFieldName;
            4. Identify the field types
              JOIN LOAD
                  subfield("Field Name",'_',2) AS BaseFieldName,
                  1 AS IsNonEconomic
              Resident FieldTypes
              WHERE Value='Non-Economic';
            5. Now we can do your IF:
              JOIN
                  Id,
                  BaseFieldName,
                  IF(not isnull(IsNonEconomic) AND Value1<>Value2,1,0) As FieldHasChanged // N.B. Assumes Value1 & Value2 are never NULL - expand the <> condition is need be.
              Resident SemipivotedTransactions;
            6. At this point, we now have recorded in SemipivotedTransactions a 1 for each (Id,BaseFieldName) if that field has changed, and 0 otherwise.  But you want an overall summary for each Id.  So, now we do a final aggregation & join back to the original table:
              JOIN (Transactions)
              LOAD
                  Id,
                  ( sum(FieldHasChanged)>0 ) as TransactionHasChanged  // This will be -1 if true and 0 if false
              Resident SemipivotedTranscations
              Group by Id;
            7. Drop the other tables.
            • Re: Apply script logic based on metadata
              Angus Monro

              Now for the dynamic coding approach.  It works out a lot simpler, but is a much more sophisticated approach and can be harder to read if you don't be careful to comment liberally.

               

              The idea is that it'd be really nice if you could just do a statement like this:

                  JOIN (Transactions)

                  LOAD

                      Id,

                      ( O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...)  AS TransactionHasChanged // assuming values are never null, for simplicity

                  Resident Transactions;

               

              but with only the Non-economic fields listed.  Let me ignore the Economic/Non-economic thing for the moment.

               

              The approach is to build a variable vHasTransactionChanged with has as its content the following string: 'O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...'.  (we'll do this with help from LOAD, CONCATENATE and peek()).  Then, our statement becomes:

               

                  JOIN (Transactions)

                  LOAD

                      Id,

                      ( $(vHasTransactionChanged) )  AS TransactionHasChanged // Note: 'bare' variable references like 'v' aren't allowed in LOAD statements, but expanded references like '$(v)' are.

                  Resident Transactions;

               

              How to build that variable?  Pretty straighforward, really:

              1. Append a field to the FieldTypes tables, to identify the BaseFieldType:

                  JOIN (FieldTypes)
                  LOAD

                      "Field Name",

                      subfield("Field Name",'_',2) AS BaseFieldType

                  Resident FieldTypes;

              2. Now build the '<>' subconditions:

                  NEqConditions:

                  LOAD
                      BaseFieldType,

                      concatenate("Field Name",'<>') AS NEqCondition

                  Resident FieldTypes 

                  Group By BaseFieldType;

              3. NEqConditions now has records like (BaseFieldType,NeqCondition)=('Field1','O_Field1<>N_Field1') in it.  We therefore now concatenate these individual NeqCondition values into the final desired condition:

                  FinalCondition:

                  LOAD

                      concatenate(NEqCondition,'  OR  ') AS FinalCondition

                  Resident NEqConditions;

              4. Finally, load into the variable:

                  LET vHasTransactionChanged=peek('FinalCondition');

               

              To deal with your Economic/Non-economic thing, you just want to discard the Economic records before step 2 or 3. above using a KEEP statement of some sort.

              • Re: Apply script logic based on metadata
                Angus Monro

                In my previous responses to this question, I outlined two very different approaches to this problem.  But which solution would I choose?  I would hands-down, without a doubt, choose the dynamic-coding approach.  This is because it will have massively better performance than the CROSSTABLE-based approach, in terms of both memory and speed.  This difference is so large that I would strongly recommend against the CROSSTABLE-based approach, unless the Transaction table is actually quite small so that performance isn't any problem.

                 

                To see the difference, consider the amount of data being manipulated in each step of each algorithm.  Let's say we have 1,000,000 transactions and 10 old/new field pairs (and hence 20 of these descriptive fields).

                 

                 

                In the CROSSTABLE-based solution:

                Step 1. CROSSTABLES the 1,000,000 transactions to produce a table with 20,000,000 records.

                Step 2. adds another column to these 20,000,000 records, resulting in a 33% increase in table size (from 3 fields to 4 fields)

                Step 3. aggregates these into half the number of records (10,000,000) and 4 fields

                Step 4. adds another column to these 10,000,000 records, but the value in it is null half the time, resulting in (say) a 12% increase)

                Step 5. adds another column to these 10,000,000 records

                Step 6. aggregates these back to the transaction level (i.e. 1,000,000 records) and rejoins.

                 

                In the dynamic script solution:

                Step 1. adds a column to the 20-record FieldTypes table.

                Step 2. aggregates it into a 10-record table

                Step 3. aggregates this into a 1-record table

                Step 4. operates on a single cell of the table

                The resulting value is then applied to the Transactions table in a single, final, 1,000,000-record operation.

                 

                The dynamic solution is clearly radically easier on CPU & memory.