Solved: Re: Apply script logic based on metadata - Qlik Community

Report Inappropriate Content · ‎2013-04-17

Hi,

I have a transaction table with two fields for each 'field', original value (O_xxx) and new value (N_xxxx)

O_Field1	N_Field1	O_Field2	N_Field2

Each field can be of type, economic or non-economic

I have another table which tell me which it is:

Field pairs always have the same Value

Field Name	Value
O_Field1	Economic
N_Field1	Economic
O_Field2	Non-Economic

I need to apply some logic in the script which only occurs IF the field is non-economic.

If the field 'pair' are non-economic AND the original value is diffrent to new value

Then flag the transaction.

You can assign a tag to a field, but it cannot be used in any further logic.

Any suggestions on how to tackle this would be appreciated.

Thanks,

Tom.

gussfish · ‎2013-04-21

Now for the dynamic coding approach. It works out a lot simpler, but is a much more sophisticated approach and can be harder to read if you don't be careful to comment liberally.

The idea is that it'd be really nice if you could just do a statement like this:

JOIN (Transactions)

LOAD

Id,

( O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...) AS TransactionHasChanged // assuming values are never null, for simplicity

Resident Transactions;

but with only the Non-economic fields listed. Let me ignore the Economic/Non-economic thing for the moment.

The approach is to build a variable vHasTransactionChanged with has as its content the following string: 'O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...'. (we'll do this with help from LOAD, CONCATENATE and peek()). Then, our statement becomes:

JOIN (Transactions)

LOAD

Id,

( $(vHasTransactionChanged) ) AS TransactionHasChanged // Note: 'bare' variable references like 'v' aren't allowed in LOAD statements, but expanded references like '$(v)' are.

Resident Transactions;

How to build that variable? Pretty straighforward, really:

1. Append a field to the FieldTypes tables, to identify the BaseFieldType:

JOIN (FieldTypes)
LOAD

"Field Name",

subfield("Field Name",'_',2) AS BaseFieldType

Resident FieldTypes;

2. Now build the '<>' subconditions:

NEqConditions:

LOAD
BaseFieldType,

concatenate("Field Name",'<>') AS NEqCondition

Resident FieldTypes

Group By BaseFieldType;

3. NEqConditions now has records like (BaseFieldType,NeqCondition)=('Field1','O_Field1<>N_Field1') in it. We therefore now concatenate these individual NeqCondition values into the final desired condition:

FinalCondition:

LOAD

concatenate(NEqCondition,' OR ') AS FinalCondition

Resident NEqConditions;

4. Finally, load into the variable:

LET vHasTransactionChanged=peek('FinalCondition');

To deal with your Economic/Non-economic thing, you just want to discard the Economic records before step 2 or 3. above using a KEEP statement of some sort.

View solution in original post

sujeetsingh · ‎2013-04-17

One thing making me confused that what is this original field Please explain names with clearity.

Report Inappropriate Content · ‎2013-04-17

The data comes from a trading platform. The feed gives me the original value and then the new value of the field

Some real field names:

O_PFOLIO

N_PFOLIO

They are portfolios, if the portfolio was amended in the trade, the values in the two columns will be different.

gussfish · ‎2013-04-21

Hi Tom,

I can think of a couple of solutions to this: a CROSSTABLE-based approach and a dynamic coding approach. Both are a bit messy. In this post, I'll describe the CROSSTABLE-based approach.

Let me assume that

your transactions table is called Transactions
it has a unique key field called Id
your second table is called FieldTypes
your objective is to append another column to each transaction record, using the desired IF within the context of a LOAD statement to produce its content.

The idea, then, is as follows:

Do a CROSSTABLE LOAD on Transactions, resulting in table DepivotedTransactions containing fields Id, Field Name and Field Value. Awkwardly, this will now have separate records for, e.g., O_Field1 and N_Field1, so that they can't be directly compared.
We need to aggregate each N_ and O_ pairs back together somehow so that they can be compared. this will need to be grouped by the Id and the base field type. So, now we add a column that contains that base field type. i.e.
JOIN LOAD
    Id,
    "Field Name",
    subfield("Field Name",'_',2) AS BaseFieldName
Resident DepivotedTransactions;
Now we can do the aggregation:
SemipivotedTransactions:
LOAD
    Id,
    BaseFieldName,
    MinString("Field Value"&' ') AS Value1, // I'm assuming the fields contain both numeric & string data, so am forcing all of them to be string.
    MaxString("Field Value"&' ') AS Value2
Resident DepivotedTransactions
Group By Id,BaseFieldName;
Identify the field types
JOIN LOAD
subfield("Field Name",'_',2) AS BaseFieldName,
1 AS IsNonEconomic
Resident FieldTypes
WHERE Value='Non-Economic';
Now we can do your IF:
JOIN
    Id,
    BaseFieldName,
    IF(not isnull(IsNonEconomic) AND Value1<>Value2,1,0) As FieldHasChanged // N.B. Assumes Value1 & Value2 are never NULL - expand the <> condition is need be.
Resident SemipivotedTransactions;
At this point, we now have recorded in SemipivotedTransactions a 1 for each (Id,BaseFieldName) if that field has changed, and 0 otherwise. But you want an overall summary for each Id. So, now we do a final aggregation & join back to the original table:
JOIN (Transactions)
LOAD
Id,
( sum(FieldHasChanged)>0 ) as TransactionHasChanged // This will be -1 if true and 0 if false
Resident SemipivotedTranscations
Group by Id;
Drop the other tables.

gussfish · ‎2013-04-21

Now for the dynamic coding approach. It works out a lot simpler, but is a much more sophisticated approach and can be harder to read if you don't be careful to comment liberally.

The idea is that it'd be really nice if you could just do a statement like this:

JOIN (Transactions)

LOAD

Id,

( O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...) AS TransactionHasChanged // assuming values are never null, for simplicity

Resident Transactions;

but with only the Non-economic fields listed. Let me ignore the Economic/Non-economic thing for the moment.

The approach is to build a variable vHasTransactionChanged with has as its content the following string: 'O_Field1<>N_Field1 or O_Field2<>N_Field2 or ...'. (we'll do this with help from LOAD, CONCATENATE and peek()). Then, our statement becomes:

JOIN (Transactions)

LOAD

Id,

( $(vHasTransactionChanged) ) AS TransactionHasChanged // Note: 'bare' variable references like 'v' aren't allowed in LOAD statements, but expanded references like '$(v)' are.

Resident Transactions;

How to build that variable? Pretty straighforward, really:

1. Append a field to the FieldTypes tables, to identify the BaseFieldType:

JOIN (FieldTypes)
LOAD

"Field Name",

subfield("Field Name",'_',2) AS BaseFieldType

Resident FieldTypes;

2. Now build the '<>' subconditions:

NEqConditions:

LOAD
BaseFieldType,

concatenate("Field Name",'<>') AS NEqCondition

Resident FieldTypes

Group By BaseFieldType;

3. NEqConditions now has records like (BaseFieldType,NeqCondition)=('Field1','O_Field1<>N_Field1') in it. We therefore now concatenate these individual NeqCondition values into the final desired condition:

FinalCondition:

LOAD

concatenate(NEqCondition,' OR ') AS FinalCondition

Resident NEqConditions;

4. Finally, load into the variable:

LET vHasTransactionChanged=peek('FinalCondition');

To deal with your Economic/Non-economic thing, you just want to discard the Economic records before step 2 or 3. above using a KEEP statement of some sort.

gussfish · ‎2013-04-22

In my previous responses to this question, I outlined two very different approaches to this problem. But which solution would I choose? I would hands-down, without a doubt, choose the dynamic-coding approach. This is because it will have massively better performance than the CROSSTABLE-based approach, in terms of both memory and speed. This difference is so large that I would strongly recommend against the CROSSTABLE-based approach, unless the Transaction table is actually quite small so that performance isn't any problem.

To see the difference, consider the amount of data being manipulated in each step of each algorithm. Let's say we have 1,000,000 transactions and 10 old/new field pairs (and hence 20 of these descriptive fields).

In the CROSSTABLE-based solution:

Step 1. CROSSTABLES the 1,000,000 transactions to produce a table with 20,000,000 records.

Step 2. adds another column to these 20,000,000 records, resulting in a 33% increase in table size (from 3 fields to 4 fields)

Step 3. aggregates these into half the number of records (10,000,000) and 4 fields

Step 4. adds another column to these 10,000,000 records, but the value in it is null half the time, resulting in (say) a 12% increase)

Step 5. adds another column to these 10,000,000 records

Step 6. aggregates these back to the transaction level (i.e. 1,000,000 records) and rejoins.

In the dynamic script solution:

Step 1. adds a column to the 20-record FieldTypes table.

Step 2. aggregates it into a 10-record table

Step 3. aggregates this into a 1-record table

Step 4. operates on a single cell of the table

The resulting value is then applied to the Transactions table in a single, final, 1,000,000-record operation.

The dynamic solution is clearly radically easier on CPU & memory.