Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a transaction table with two fields for each 'field', original value (O_xxx) and new value (N_xxxx)
O_Field1 | N_Field1 | O_Field2 | N_Field2 |
---|---|---|---|
Each field can be of type, economic or non-economic
I have another table which tell me which it is:
Field pairs always have the same Value
Field Name | Value |
---|---|
O_Field1 | Economic |
N_Field1 | Economic |
O_Field2 | Non-Economic |
I need to apply some logic in the script which only occurs IF the field is non-economic.
If the field 'pair' are non-economic AND the original value is diffrent to new value
Then flag the transaction.
You can assign a tag to a field, but it cannot be used in any further logic.
Any suggestions on how to tackle this would be appreciated.
Thanks,
Tom.
Now for the dynamic coding approach. It works out a lot simpler, but is a much more sophisticated approach and can be harder to read if you don't be careful to comment liberally.
The idea is that it'd be really nice if you could just do a statement like this:
JOIN (Transactions)
LOAD
Id,
( O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...) AS TransactionHasChanged // assuming values are never null, for simplicity
Resident Transactions;
but with only the Non-economic fields listed. Let me ignore the Economic/Non-economic thing for the moment.
The approach is to build a variable vHasTransactionChanged with has as its content the following string: 'O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...'. (we'll do this with help from LOAD, CONCATENATE and peek()). Then, our statement becomes:
JOIN (Transactions)
LOAD
Id,
( $(vHasTransactionChanged) ) AS TransactionHasChanged // Note: 'bare' variable references like 'v' aren't allowed in LOAD statements, but expanded references like '$(v)' are.
Resident Transactions;
How to build that variable? Pretty straighforward, really:
1. Append a field to the FieldTypes tables, to identify the BaseFieldType:
JOIN (FieldTypes)
LOAD
"Field Name",
subfield("Field Name",'_',2) AS BaseFieldType
Resident FieldTypes;
2. Now build the '<>' subconditions:
NEqConditions:
LOAD
BaseFieldType,
concatenate("Field Name",'<>') AS NEqCondition
Resident FieldTypes
Group By BaseFieldType;
3. NEqConditions now has records like (BaseFieldType,NeqCondition)=('Field1','O_Field1<>N_Field1') in it. We therefore now concatenate these individual NeqCondition values into the final desired condition:
FinalCondition:
LOAD
concatenate(NEqCondition,' OR ') AS FinalCondition
Resident NEqConditions;
4. Finally, load into the variable:
LET vHasTransactionChanged=peek('FinalCondition');
To deal with your Economic/Non-economic thing, you just want to discard the Economic records before step 2 or 3. above using a KEEP statement of some sort.
One thing making me confused that what is this original field Please explain names with clearity.
The data comes from a trading platform. The feed gives me the original value and then the new value of the field
Some real field names:
O_PFOLIO
N_PFOLIO
They are portfolios, if the portfolio was amended in the trade, the values in the two columns will be different.
Hi Tom,
I can think of a couple of solutions to this: a CROSSTABLE-based approach and a dynamic coding approach. Both are a bit messy. In this post, I'll describe the CROSSTABLE-based approach.
Let me assume that
The idea, then, is as follows:
Now for the dynamic coding approach. It works out a lot simpler, but is a much more sophisticated approach and can be harder to read if you don't be careful to comment liberally.
The idea is that it'd be really nice if you could just do a statement like this:
JOIN (Transactions)
LOAD
Id,
( O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...) AS TransactionHasChanged // assuming values are never null, for simplicity
Resident Transactions;
but with only the Non-economic fields listed. Let me ignore the Economic/Non-economic thing for the moment.
The approach is to build a variable vHasTransactionChanged with has as its content the following string: 'O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...'. (we'll do this with help from LOAD, CONCATENATE and peek()). Then, our statement becomes:
JOIN (Transactions)
LOAD
Id,
( $(vHasTransactionChanged) ) AS TransactionHasChanged // Note: 'bare' variable references like 'v' aren't allowed in LOAD statements, but expanded references like '$(v)' are.
Resident Transactions;
How to build that variable? Pretty straighforward, really:
1. Append a field to the FieldTypes tables, to identify the BaseFieldType:
JOIN (FieldTypes)
LOAD
"Field Name",
subfield("Field Name",'_',2) AS BaseFieldType
Resident FieldTypes;
2. Now build the '<>' subconditions:
NEqConditions:
LOAD
BaseFieldType,
concatenate("Field Name",'<>') AS NEqCondition
Resident FieldTypes
Group By BaseFieldType;
3. NEqConditions now has records like (BaseFieldType,NeqCondition)=('Field1','O_Field1<>N_Field1') in it. We therefore now concatenate these individual NeqCondition values into the final desired condition:
FinalCondition:
LOAD
concatenate(NEqCondition,' OR ') AS FinalCondition
Resident NEqConditions;
4. Finally, load into the variable:
LET vHasTransactionChanged=peek('FinalCondition');
To deal with your Economic/Non-economic thing, you just want to discard the Economic records before step 2 or 3. above using a KEEP statement of some sort.
In my previous responses to this question, I outlined two very different approaches to this problem. But which solution would I choose? I would hands-down, without a doubt, choose the dynamic-coding approach. This is because it will have massively better performance than the CROSSTABLE-based approach, in terms of both memory and speed. This difference is so large that I would strongly recommend against the CROSSTABLE-based approach, unless the Transaction table is actually quite small so that performance isn't any problem.
To see the difference, consider the amount of data being manipulated in each step of each algorithm. Let's say we have 1,000,000 transactions and 10 old/new field pairs (and hence 20 of these descriptive fields).
In the CROSSTABLE-based solution:
Step 1. CROSSTABLES the 1,000,000 transactions to produce a table with 20,000,000 records.
Step 2. adds another column to these 20,000,000 records, resulting in a 33% increase in table size (from 3 fields to 4 fields)
Step 3. aggregates these into half the number of records (10,000,000) and 4 fields
Step 4. adds another column to these 10,000,000 records, but the value in it is null half the time, resulting in (say) a 12% increase)
Step 5. adds another column to these 10,000,000 records
Step 6. aggregates these back to the transaction level (i.e. 1,000,000 records) and rejoins.
In the dynamic script solution:
Step 1. adds a column to the 20-record FieldTypes table.
Step 2. aggregates it into a 10-record table
Step 3. aggregates this into a 1-record table
Step 4. operates on a single cell of the table
The resulting value is then applied to the Transactions table in a single, final, 1,000,000-record operation.
The dynamic solution is clearly radically easier on CPU & memory.