Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Apply script logic based on metadata

Hi,

I have a transaction table with two fields for each 'field', original value (O_xxx) and new value (N_xxxx)

O_Field1N_Field1O_Field2N_Field2

Each field can be of type, economic or non-economic

I have another table which tell me which it is:

Field pairs always have the same Value

Field NameValue
O_Field1Economic
N_Field1Economic
O_Field2Non-Economic

I need to apply some logic in the script which only occurs IF the field is non-economic.

If the field 'pair' are non-economic AND the original value is diffrent to new value

Then flag the transaction.

You can assign a tag to a field, but it cannot be used in any further logic.

Any suggestions on how to tackle this would be appreciated.

Thanks,

Tom.

1 Solution

Accepted Solutions
gussfish
Creator II
Creator II

Now for the dynamic coding approach.  It works out a lot simpler, but is a much more sophisticated approach and can be harder to read if you don't be careful to comment liberally.

The idea is that it'd be really nice if you could just do a statement like this:

    JOIN (Transactions)

    LOAD

        Id,

        ( O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...)  AS TransactionHasChanged // assuming values are never null, for simplicity

    Resident Transactions;

but with only the Non-economic fields listed.  Let me ignore the Economic/Non-economic thing for the moment.

The approach is to build a variable vHasTransactionChanged with has as its content the following string: 'O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...'.  (we'll do this with help from LOAD, CONCATENATE and peek()).  Then, our statement becomes:

    JOIN (Transactions)

    LOAD

        Id,

        ( $(vHasTransactionChanged) )  AS TransactionHasChanged // Note: 'bare' variable references like 'v' aren't allowed in LOAD statements, but expanded references like '$(v)' are.

    Resident Transactions;

How to build that variable?  Pretty straighforward, really:

1. Append a field to the FieldTypes tables, to identify the BaseFieldType:

    JOIN (FieldTypes)
    LOAD

        "Field Name",

        subfield("Field Name",'_',2) AS BaseFieldType

    Resident FieldTypes;

2. Now build the '<>' subconditions:

    NEqConditions:

    LOAD
        BaseFieldType,

        concatenate("Field Name",'<>') AS NEqCondition

    Resident FieldTypes 

    Group By BaseFieldType;

3. NEqConditions now has records like (BaseFieldType,NeqCondition)=('Field1','O_Field1<>N_Field1') in it.  We therefore now concatenate these individual NeqCondition values into the final desired condition:

    FinalCondition:

    LOAD

        concatenate(NEqCondition,'  OR  ') AS FinalCondition

    Resident NEqConditions;

4. Finally, load into the variable:

    LET vHasTransactionChanged=peek('FinalCondition');

To deal with your Economic/Non-economic thing, you just want to discard the Economic records before step 2 or 3. above using a KEEP statement of some sort.

View solution in original post

5 Replies
sujeetsingh
Master III
Master III

One thing making me confused that what is this original field Please explain names with clearity.

Not applicable
Author

The data comes from a trading platform. The feed gives me the original value and then the new value of the field

Some real field names:

O_PFOLIO

N_PFOLIO

They are portfolios, if the portfolio was amended in the trade, the values in the two columns will be different.

gussfish
Creator II
Creator II

Hi Tom,

I can think of a couple of solutions to this: a CROSSTABLE-based approach and a dynamic coding approach.  Both are a bit messy.  In this post, I'll describe the CROSSTABLE-based approach. 

Let me assume that

  • your transactions table is called Transactions
  • it has a unique key field called Id
  • your second table is called FieldTypes
  • your objective is to append another column to each transaction record, using the desired IF within the context of a LOAD statement to produce its content.

The idea, then, is as follows:

  1. Do a CROSSTABLE LOAD on Transactions, resulting in table DepivotedTransactions containing fields Id, Field Name and Field Value.  Awkwardly, this will now have separate records for, e.g., O_Field1 and N_Field1, so that they can't be directly compared. 
  2. We need to aggregate each N_ and O_ pairs back together somehow so that they can be compared.  this will need to be grouped by the Id and the base field type.  So, now we add a column that contains that base field type. i.e.
    JOIN LOAD
        Id,
        "Field Name",
        subfield("Field Name",'_',2) AS BaseFieldName
    Resident DepivotedTransactions;
  3. Now we can do the aggregation:
    SemipivotedTransactions:
    LOAD
        Id,
        BaseFieldName,
        MinString("Field Value"&' ') AS Value1, // I'm assuming the fields contain both numeric & string data, so am forcing all of them to be string.
        MaxString("Field Value"&' ') AS Value2
    Resident DepivotedTransactions
    Group By Id,BaseFieldName;
  4. Identify the field types
    JOIN LOAD
        subfield("Field Name",'_',2) AS BaseFieldName,
        1 AS IsNonEconomic
    Resident FieldTypes
    WHERE Value='Non-Economic';
  5. Now we can do your IF:
    JOIN
        Id,
        BaseFieldName,
        IF(not isnull(IsNonEconomic) AND Value1<>Value2,1,0) As FieldHasChanged // N.B. Assumes Value1 & Value2 are never NULL - expand the <> condition is need be.
    Resident SemipivotedTransactions;
  6. At this point, we now have recorded in SemipivotedTransactions a 1 for each (Id,BaseFieldName) if that field has changed, and 0 otherwise.  But you want an overall summary for each Id.  So, now we do a final aggregation & join back to the original table:
    JOIN (Transactions)
    LOAD
        Id,
        ( sum(FieldHasChanged)>0 ) as TransactionHasChanged  // This will be -1 if true and 0 if false
    Resident SemipivotedTranscations
    Group by Id;
  7. Drop the other tables.
gussfish
Creator II
Creator II

Now for the dynamic coding approach.  It works out a lot simpler, but is a much more sophisticated approach and can be harder to read if you don't be careful to comment liberally.

The idea is that it'd be really nice if you could just do a statement like this:

    JOIN (Transactions)

    LOAD

        Id,

        ( O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...)  AS TransactionHasChanged // assuming values are never null, for simplicity

    Resident Transactions;

but with only the Non-economic fields listed.  Let me ignore the Economic/Non-economic thing for the moment.

The approach is to build a variable vHasTransactionChanged with has as its content the following string: 'O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...'.  (we'll do this with help from LOAD, CONCATENATE and peek()).  Then, our statement becomes:

    JOIN (Transactions)

    LOAD

        Id,

        ( $(vHasTransactionChanged) )  AS TransactionHasChanged // Note: 'bare' variable references like 'v' aren't allowed in LOAD statements, but expanded references like '$(v)' are.

    Resident Transactions;

How to build that variable?  Pretty straighforward, really:

1. Append a field to the FieldTypes tables, to identify the BaseFieldType:

    JOIN (FieldTypes)
    LOAD

        "Field Name",

        subfield("Field Name",'_',2) AS BaseFieldType

    Resident FieldTypes;

2. Now build the '<>' subconditions:

    NEqConditions:

    LOAD
        BaseFieldType,

        concatenate("Field Name",'<>') AS NEqCondition

    Resident FieldTypes 

    Group By BaseFieldType;

3. NEqConditions now has records like (BaseFieldType,NeqCondition)=('Field1','O_Field1<>N_Field1') in it.  We therefore now concatenate these individual NeqCondition values into the final desired condition:

    FinalCondition:

    LOAD

        concatenate(NEqCondition,'  OR  ') AS FinalCondition

    Resident NEqConditions;

4. Finally, load into the variable:

    LET vHasTransactionChanged=peek('FinalCondition');

To deal with your Economic/Non-economic thing, you just want to discard the Economic records before step 2 or 3. above using a KEEP statement of some sort.

gussfish
Creator II
Creator II

In my previous responses to this question, I outlined two very different approaches to this problem.  But which solution would I choose?  I would hands-down, without a doubt, choose the dynamic-coding approach.  This is because it will have massively better performance than the CROSSTABLE-based approach, in terms of both memory and speed.  This difference is so large that I would strongly recommend against the CROSSTABLE-based approach, unless the Transaction table is actually quite small so that performance isn't any problem.

To see the difference, consider the amount of data being manipulated in each step of each algorithm.  Let's say we have 1,000,000 transactions and 10 old/new field pairs (and hence 20 of these descriptive fields).

In the CROSSTABLE-based solution:

Step 1. CROSSTABLES the 1,000,000 transactions to produce a table with 20,000,000 records.

Step 2. adds another column to these 20,000,000 records, resulting in a 33% increase in table size (from 3 fields to 4 fields)

Step 3. aggregates these into half the number of records (10,000,000) and 4 fields

Step 4. adds another column to these 10,000,000 records, but the value in it is null half the time, resulting in (say) a 12% increase)

Step 5. adds another column to these 10,000,000 records

Step 6. aggregates these back to the transaction level (i.e. 1,000,000 records) and rejoins.

In the dynamic script solution:

Step 1. adds a column to the 20-record FieldTypes table.

Step 2. aggregates it into a 10-record table

Step 3. aggregates this into a 1-record table

Step 4. operates on a single cell of the table

The resulting value is then applied to the Transactions table in a single, final, 1,000,000-record operation.

The dynamic solution is clearly radically easier on CPU & memory.