I can think of a couple of solutions to this: a CROSSTABLE-based approach and a dynamic coding approach. Both are a bit messy. In this post, I'll describe the CROSSTABLE-based approach.
Let me assume that
- your transactions table is called Transactions
- it has a unique key field called Id
- your second table is called FieldTypes
- your objective is to append another column to each transaction record, using the desired IF within the context of a LOAD statement to produce its content.
The idea, then, is as follows:
- Do a CROSSTABLE LOAD on Transactions, resulting in table DepivotedTransactions containing fields Id, Field Name and Field Value. Awkwardly, this will now have separate records for, e.g., O_Field1 and N_Field1, so that they can't be directly compared.
- We need to aggregate each N_ and O_ pairs back together somehow so that they can be compared. this will need to be grouped by the Id and the base field type. So, now we add a column that contains that base field type. i.e.
subfield("Field Name",'_',2) AS BaseFieldName
- Now we can do the aggregation:
MinString("Field Value"&' ') AS Value1, // I'm assuming the fields contain both numeric & string data, so am forcing all of them to be string.
MaxString("Field Value"&' ') AS Value2
Group By Id,BaseFieldName;
- Identify the field types
subfield("Field Name",'_',2) AS BaseFieldName,
1 AS IsNonEconomic
- Now we can do your IF:
IF(not isnull(IsNonEconomic) AND Value1<>Value2,1,0) As FieldHasChanged // N.B. Assumes Value1 & Value2 are never NULL - expand the <> condition is need be.
- At this point, we now have recorded in SemipivotedTransactions a 1 for each (Id,BaseFieldName) if that field has changed, and 0 otherwise. But you want an overall summary for each Id. So, now we do a final aggregation & join back to the original table:
( sum(FieldHasChanged)>0 ) as TransactionHasChanged // This will be -1 if true and 0 if false
Group by Id;
- Drop the other tables.
Now for the dynamic coding approach. It works out a lot simpler, but is a much more sophisticated approach and can be harder to read if you don't be careful to comment liberally.
The idea is that it'd be really nice if you could just do a statement like this:
( O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...) AS TransactionHasChanged // assuming values are never null, for simplicity
but with only the Non-economic fields listed. Let me ignore the Economic/Non-economic thing for the moment.
The approach is to build a variable vHasTransactionChanged with has as its content the following string: 'O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...'. (we'll do this with help from LOAD, CONCATENATE and peek()). Then, our statement becomes:
( $(vHasTransactionChanged) ) AS TransactionHasChanged // Note: 'bare' variable references like 'v' aren't allowed in LOAD statements, but expanded references like '$(v)' are.
How to build that variable? Pretty straighforward, really:
1. Append a field to the FieldTypes tables, to identify the BaseFieldType:
subfield("Field Name",'_',2) AS BaseFieldType
2. Now build the '<>' subconditions:
concatenate("Field Name",'<>') AS NEqCondition
Group By BaseFieldType;
3. NEqConditions now has records like (BaseFieldType,NeqCondition)=('Field1','O_Field1<>N_Field1') in it. We therefore now concatenate these individual NeqCondition values into the final desired condition:
concatenate(NEqCondition,' OR ') AS FinalCondition
4. Finally, load into the variable:
To deal with your Economic/Non-economic thing, you just want to discard the Economic records before step 2 or 3. above using a KEEP statement of some sort.
In my previous responses to this question, I outlined two very different approaches to this problem. But which solution would I choose? I would hands-down, without a doubt, choose the dynamic-coding approach. This is because it will have massively better performance than the CROSSTABLE-based approach, in terms of both memory and speed. This difference is so large that I would strongly recommend against the CROSSTABLE-based approach, unless the Transaction table is actually quite small so that performance isn't any problem.
To see the difference, consider the amount of data being manipulated in each step of each algorithm. Let's say we have 1,000,000 transactions and 10 old/new field pairs (and hence 20 of these descriptive fields).
In the CROSSTABLE-based solution:
Step 1. CROSSTABLES the 1,000,000 transactions to produce a table with 20,000,000 records.
Step 2. adds another column to these 20,000,000 records, resulting in a 33% increase in table size (from 3 fields to 4 fields)
Step 3. aggregates these into half the number of records (10,000,000) and 4 fields
Step 4. adds another column to these 10,000,000 records, but the value in it is null half the time, resulting in (say) a 12% increase)
Step 5. adds another column to these 10,000,000 records
Step 6. aggregates these back to the transaction level (i.e. 1,000,000 records) and rejoins.
In the dynamic script solution:
Step 1. adds a column to the 20-record FieldTypes table.
Step 2. aggregates it into a 10-record table
Step 3. aggregates this into a 1-record table
Step 4. operates on a single cell of the table
The resulting value is then applied to the Transactions table in a single, final, 1,000,000-record operation.
The dynamic solution is clearly radically easier on CPU & memory.