Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Amphan
Contributor III
Contributor III

Intervalmatch on several fields with AND operation?

I am trying to figure out how to categorize a transaction depending on whether several different fields in the transaction are within a certain range. In my current task I have seven fields, let's call them ACCODE01 - 07 and seven corresponding ranges, ACCODE01_START, ACCODE01_END, ACCODE02_START, ACCODE02_END etc. If each ACCODEXX field is within its corresponding range, ie. ACCODE01 is in ACCODE01_START to ACCODE01_END AND ACCODE02 is in ACCODE02_START to ACCODE02_END AND etc. then I want to mark the transaction with a Category.

Let's say I have these tables (random data, might not add up), what I want is to add CATEGORY from INTERVAL: to FACT: where ACCODEXX is in THE ACCODEXX_START to ACCODEXX_END range. If there is no match, no category should be assigned.

FACT:

LOAD * INLINE [

    ACCODE01, ACCODE02, ACCODE03, ACCODE04, ACCODE05, ACCODE06, ACCODE07, VERID, AMOUNT

    10000, 40261, 12340, 12300, 5478, 2541, 2587, 123456789, $100

    20000, 86210, 43210, 45600, 9866, 2365, 1236, 987546, $1000

    30000, 86400, 58740, 78900, 2222, 5895, 9854, 124578, $200

    40000, 86111, 98560, 98700, 9874, 4578, 7854, 326598, $1100

    50000, 46435, 12450, 65400, 2587, 1245, 2569, 785612, $800

    60000, 45698, 45780, 32100, 3698, 6542, 3254, 324578, $500

    70000, 32145, 65540, 74100, 1478, 1111, 1256, 1245782, $600

    80000, 78954, 32120, 36900, 2589, 2224, 9854, 69854225, $1900

];

INTERVAL:

LOAD * INLINE [

    CATEGORY, ACCODE01_START, ACCODE01_END, ACCODE02_START, ACCODE02_END, ACCODE03_START, ACCODE03_END, ACCODE04_START, ACCODE04_END, ACCODE05_START, ACCODE05_END, ACCODE06_START, ACCODE06_END, ACCODE07_START, ACCODE07_END

    APPLES, 0000, 8999, 40260, 40270, 00000, 96999, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 0000

    ORANGES, 0000, 8999, 65410, 65510, 00000, 96999, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 0000

    BANANAS, 0000, 8999, 86110, 86710, 03560, 03560, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 9999

    CHERRYS, 0000, 8999, 86310, 86330, 03560, 03560, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 9999

    LIMES, 0000, 8999, 86400, 86400, 03560, 03560, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 9999

    LEMONS, 0000, 8999, 46430, 46440, 00000, 96999, 1000, 1099, 0000, 9999, 0000, 9999, 0000, 0000

];

I'd really appreciate any help to move forward on this task.

Thanks in advance!

Ola

21 Replies
Not applicable

Yeah, cartesian will make very big table. How many rows do you have on the

beginning im fact and interval tables?

15-05-2014 00:28 użytkownik "Ola Paulsson" <qcwebmaster@qlik.com>

napisał:

Amphan
Contributor III
Contributor III
Author

The fact table will have around 8-10 million rows but the interval table will only have around 100 rows (100 categories) or less. Each accode field holds an integer with 5 positions so with full interval ranges the cartesian product from that table would have a max of 100000 raised to 7 rows, am I right? In reality it will be lower as the interval ranges will be more narrow but still, it could be a huge table.

//Ola

Not applicable

How many categories for one fact you expect separately by each key?

15-05-2014 08:42 użytkownik "Ola Paulsson" <qcwebmaster@qlik.com>

napisał:

Amphan
Contributor III
Contributor III
Author

One, if I understand you correctly. Every transaction (row) in the fact table can only have one category. The category is determined by whether each of the accode fields are within range in its corresponding interval. These combinations of intervals in the intervals table are disjoint and not overlapping (unique) as a composite key.

//Ola

Not applicable

If so, my example should work as you expect. It founds one category by each

key and on the end checks If all 7 categories are the same. If YES, fact

has this category. If no, fact has no category. Is it the logice you need?

15-05-2014 09:22 użytkownik "Ola Paulsson" <qcwebmaster@qlik.com>

napisał:

Qlik Community <http://community.qlik.com/> Intervalmatch on

several fields with AND operation?

reply from Ola Paulsson<http://community.qlik.com/people/OlaPaulsson?et=watches.email.thread>in

Scripting - View the full discussion<http://community.qlik.com/message/528866?et=watches.email.thread#528866>

Amphan
Contributor III
Contributor III
Author

I will need to take a closer look at the example code you provided, it is a bit above my skill level so I need some time to understand it . In the meantime I think I found a solution in the document you linked to but I choose method 3 on page 6 that says "Don’t use this method unless you have very small tables". I will see if I can learn more during the weekend.

//Ola

Not applicable

Ola,

script works and:

tmp_f: load 0 as VERID AutoGenerate(0);

//prepares empty table (to have it for 1'st join)

for i=1 to 7

//we have 7 keys, so we need to deal with each of them

//we join into existing tmp_f table (this is why i created it over loop)

join(tmp_f)

load ACCODE0$ as ACCODE, VERID  Resident FACT;

//ACCODE0$(i) gives column ACCODE01,2,3... depending on iter number

tmp_i:

load ACCODE0$(i)_START as start, ACCODE0$(i)_END as end, CATEGORY as cat$(i), Autonumber(Num(ACCODE0$(i)_START) & '|' & Num(ACCODE0$(i)_END)) as IntervalID  Resident inter;

//into tmp_i table i load i'th range boudary (columns ACCODE0$(i)_START and ACCODE0$(i)_END), but i create //from them key (interval_id) for each row

bridge:

IntervalMatch ( ACCODE ) load distinct start,end Resident tmp_i;

//create table by intervalmatch (ACCODE is key from tmp_f table used for this iteration , start and end are range //from tmp_i (temporary interval) for this iteration

//and we have to clear what our interval produces, because we would like to have only tmp_f table when we start //next iteration in our loop.

Join (tmp_f)

Load ACCODE, Autonumber(Num(start) & '|' & Num(end)) as IntervalID

Resident bridge;

Drop Table bridge;

Join (tmp_f) LOAD cat$(i),  IntervalID resident tmp_i;

drop table tmp_i;

drop fields ACCODE,IntervalID;

NEXT;

//after last one iteration we should have tmp_f table where we have 7 category columns (cat1,cat2,....cat7)

//you may comment instructions under and look on it in your application.

//from this table we will check if all 7 cat have the same value.

//if max=min then all existing values must be the same.

//so, we need only check if the 7 cat have values (no nulls) and we may decide to give category for fact. Knowing //that all cat fieldsghave the same value i take first, so, my result is cat1

load VERID,  if(RangeMaxString(cat1,cat2,cat3,cat4,cat5,cat6,cat7)=RangeMinString(cat1,cat2,cat3,cat4,cat5,cat6,cat7) and RangetextCount(cat1,cat2,cat3,cat4,cat5,cat6,cat7)=7, cat1) as category Resident tmp_f Where Exists(VERID);

drop table tmp_f;

Amphan
Contributor III
Contributor III
Author

You are too kind! I will look at this tonight. In the meantime, I got this to work from page 6, method 3 in the document you linked to. I will see if I can use another method which doesn't create a cartesian product but this works on my testcase. But i would like to create a general method to categorize a transaction which isn't so memory intensive.

tmp_BridgeTable:

load distinct

    _CompositeKey,

    ACCODE01, ACCODE02, ACCODE03, ACCODE04, ACCODE05, ACCODE06, ACCODE07

resident FACT;

Join (tmp_BridgeTable)

load

     _IntervalID,

     CATEGORY,

     ACCODE01_START, ACCODE01_END,

     ACCODE02_START, ACCODE02_END,

     ACCODE03_START, ACCODE03_END,

     ACCODE04_START, ACCODE04_END,

     ACCODE05_START, ACCODE05_END,

     ACCODE06_START, ACCODE06_END,

     ACCODE07_START, ACCODE07_END

resident INTERVAL;

BridgeTable:

load distinct _CompositeKey, _IntervalID

    resident tmp_BridgeTable

    where (ACCODE01_START <= ACCODE01 and ACCODE01 <= ACCODE01_END)

    and (ACCODE02_START <= ACCODE02 and ACCODE02 <= ACCODE02_END)

    and (ACCODE03_START <= ACCODE03 and ACCODE03 <= ACCODE03_END)

    and (ACCODE04_START <= ACCODE04 and ACCODE04 <= ACCODE04_END)

    and (ACCODE05_START <= ACCODE05 and ACCODE05 <= ACCODE05_END)

    and (ACCODE06_START <= ACCODE06 and ACCODE06 <= ACCODE06_END)

    and (ACCODE07_START <= ACCODE07 and ACCODE07 <= ACCODE07_END)

;

drop table tmp_BridgeTable;

//Ola

Not applicable

Hi Ola,

i looked for join from your script:

Join (tmp_BridgeTable)

What are commone keys in joindet tables?

regards

Darek

Amphan
Contributor III
Contributor III
Author

When loading FACT: I add

ACCODE01&'|'&ACCODE02&'|'&ACCODE03&'|'&ACCODE04&'|'&ACCODE05&'|'&ACCODE06&'|'&ACCODE07 as _CompositeKey,

and when I load Interval I add

RecNo() as _IntervalID,

which I use in the BridgeTable to link them. Solves the problem and doesn't seem to be too bad on performance, but then again there isn't that much data.

/Ola