Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am trying to figure out how to categorize a transaction depending on whether several different fields in the transaction are within a certain range. In my current task I have seven fields, let's call them ACCODE01 - 07 and seven corresponding ranges, ACCODE01_START, ACCODE01_END, ACCODE02_START, ACCODE02_END etc. If each ACCODEXX field is within its corresponding range, ie. ACCODE01 is in ACCODE01_START to ACCODE01_END AND ACCODE02 is in ACCODE02_START to ACCODE02_END AND etc. then I want to mark the transaction with a Category.
Let's say I have these tables (random data, might not add up), what I want is to add CATEGORY from INTERVAL: to FACT: where ACCODEXX is in THE ACCODEXX_START to ACCODEXX_END range. If there is no match, no category should be assigned.
FACT:
LOAD * INLINE [
ACCODE01, ACCODE02, ACCODE03, ACCODE04, ACCODE05, ACCODE06, ACCODE07, VERID, AMOUNT
10000, 40261, 12340, 12300, 5478, 2541, 2587, 123456789, $100
20000, 86210, 43210, 45600, 9866, 2365, 1236, 987546, $1000
30000, 86400, 58740, 78900, 2222, 5895, 9854, 124578, $200
40000, 86111, 98560, 98700, 9874, 4578, 7854, 326598, $1100
50000, 46435, 12450, 65400, 2587, 1245, 2569, 785612, $800
60000, 45698, 45780, 32100, 3698, 6542, 3254, 324578, $500
70000, 32145, 65540, 74100, 1478, 1111, 1256, 1245782, $600
80000, 78954, 32120, 36900, 2589, 2224, 9854, 69854225, $1900
];
INTERVAL:
LOAD * INLINE [
CATEGORY, ACCODE01_START, ACCODE01_END, ACCODE02_START, ACCODE02_END, ACCODE03_START, ACCODE03_END, ACCODE04_START, ACCODE04_END, ACCODE05_START, ACCODE05_END, ACCODE06_START, ACCODE06_END, ACCODE07_START, ACCODE07_END
APPLES, 0000, 8999, 40260, 40270, 00000, 96999, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 0000
ORANGES, 0000, 8999, 65410, 65510, 00000, 96999, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 0000
BANANAS, 0000, 8999, 86110, 86710, 03560, 03560, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 9999
CHERRYS, 0000, 8999, 86310, 86330, 03560, 03560, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 9999
LIMES, 0000, 8999, 86400, 86400, 03560, 03560, 0000, 9999, 0000, 9999, 0000, 9999, 0000, 9999
LEMONS, 0000, 8999, 46430, 46440, 00000, 96999, 1000, 1099, 0000, 9999, 0000, 9999, 0000, 0000
];
I'd really appreciate any help to move forward on this task.
Thanks in advance!
Ola
Yeah, cartesian will make very big table. How many rows do you have on the
beginning im fact and interval tables?
15-05-2014 00:28 użytkownik "Ola Paulsson" <qcwebmaster@qlik.com>
napisał:
The fact table will have around 8-10 million rows but the interval table will only have around 100 rows (100 categories) or less. Each accode field holds an integer with 5 positions so with full interval ranges the cartesian product from that table would have a max of 100000 raised to 7 rows, am I right? In reality it will be lower as the interval ranges will be more narrow but still, it could be a huge table.
//Ola
How many categories for one fact you expect separately by each key?
15-05-2014 08:42 użytkownik "Ola Paulsson" <qcwebmaster@qlik.com>
napisał:
One, if I understand you correctly. Every transaction (row) in the fact table can only have one category. The category is determined by whether each of the accode fields are within range in its corresponding interval. These combinations of intervals in the intervals table are disjoint and not overlapping (unique) as a composite key.
//Ola
If so, my example should work as you expect. It founds one category by each
key and on the end checks If all 7 categories are the same. If YES, fact
has this category. If no, fact has no category. Is it the logice you need?
15-05-2014 09:22 użytkownik "Ola Paulsson" <qcwebmaster@qlik.com>
napisał:
Qlik Community <http://community.qlik.com/> Intervalmatch on
several fields with AND operation?
reply from Ola Paulsson<http://community.qlik.com/people/OlaPaulsson?et=watches.email.thread>in
Scripting - View the full discussion<http://community.qlik.com/message/528866?et=watches.email.thread#528866>
I will need to take a closer look at the example code you provided, it is a bit above my skill level so I need some time to understand it . In the meantime I think I found a solution in the document you linked to but I choose method 3 on page 6 that says "Don’t use this method unless you have very small tables". I will see if I can learn more during the weekend.
//Ola
Ola,
script works and:
tmp_f: load 0 as VERID AutoGenerate(0);
//prepares empty table (to have it for 1'st join)
for i=1 to 7
//we have 7 keys, so we need to deal with each of them
//we join into existing tmp_f table (this is why i created it over loop)
join(tmp_f)
load ACCODE0$ as ACCODE, VERID Resident FACT;
//ACCODE0$(i) gives column ACCODE01,2,3... depending on iter number
tmp_i:
load ACCODE0$(i)_START as start, ACCODE0$(i)_END as end, CATEGORY as cat$(i), Autonumber(Num(ACCODE0$(i)_START) & '|' & Num(ACCODE0$(i)_END)) as IntervalID Resident inter;
//into tmp_i table i load i'th range boudary (columns ACCODE0$(i)_START and ACCODE0$(i)_END), but i create //from them key (interval_id) for each row
bridge:
IntervalMatch ( ACCODE ) load distinct start,end Resident tmp_i;
//create table by intervalmatch (ACCODE is key from tmp_f table used for this iteration , start and end are range //from tmp_i (temporary interval) for this iteration
//and we have to clear what our interval produces, because we would like to have only tmp_f table when we start //next iteration in our loop.
Join (tmp_f)
Load ACCODE, Autonumber(Num(start) & '|' & Num(end)) as IntervalID
Resident bridge;
Drop Table bridge;
Join (tmp_f) LOAD cat$(i), IntervalID resident tmp_i;
drop table tmp_i;
drop fields ACCODE,IntervalID;
NEXT;
//after last one iteration we should have tmp_f table where we have 7 category columns (cat1,cat2,....cat7)
//you may comment instructions under and look on it in your application.
//from this table we will check if all 7 cat have the same value.
//if max=min then all existing values must be the same.
//so, we need only check if the 7 cat have values (no nulls) and we may decide to give category for fact. Knowing //that all cat fieldsghave the same value i take first, so, my result is cat1
load VERID, if(RangeMaxString(cat1,cat2,cat3,cat4,cat5,cat6,cat7)=RangeMinString(cat1,cat2,cat3,cat4,cat5,cat6,cat7) and RangetextCount(cat1,cat2,cat3,cat4,cat5,cat6,cat7)=7, cat1) as category Resident tmp_f Where Exists(VERID);
drop table tmp_f;
You are too kind! I will look at this tonight. In the meantime, I got this to work from page 6, method 3 in the document you linked to. I will see if I can use another method which doesn't create a cartesian product but this works on my testcase. But i would like to create a general method to categorize a transaction which isn't so memory intensive.
tmp_BridgeTable:
load distinct
_CompositeKey,
ACCODE01, ACCODE02, ACCODE03, ACCODE04, ACCODE05, ACCODE06, ACCODE07
resident FACT;
Join (tmp_BridgeTable)
load
_IntervalID,
CATEGORY,
ACCODE01_START, ACCODE01_END,
ACCODE02_START, ACCODE02_END,
ACCODE03_START, ACCODE03_END,
ACCODE04_START, ACCODE04_END,
ACCODE05_START, ACCODE05_END,
ACCODE06_START, ACCODE06_END,
ACCODE07_START, ACCODE07_END
resident INTERVAL;
BridgeTable:
load distinct _CompositeKey, _IntervalID
resident tmp_BridgeTable
where (ACCODE01_START <= ACCODE01 and ACCODE01 <= ACCODE01_END)
and (ACCODE02_START <= ACCODE02 and ACCODE02 <= ACCODE02_END)
and (ACCODE03_START <= ACCODE03 and ACCODE03 <= ACCODE03_END)
and (ACCODE04_START <= ACCODE04 and ACCODE04 <= ACCODE04_END)
and (ACCODE05_START <= ACCODE05 and ACCODE05 <= ACCODE05_END)
and (ACCODE06_START <= ACCODE06 and ACCODE06 <= ACCODE06_END)
and (ACCODE07_START <= ACCODE07 and ACCODE07 <= ACCODE07_END)
;
drop table tmp_BridgeTable;
//Ola
Hi Ola,
i looked for join from your script:
Join (tmp_BridgeTable)
What are commone keys in joindet tables?
regards
Darek
When loading FACT: I add
ACCODE01&'|'&ACCODE02&'|'&ACCODE03&'|'&ACCODE04&'|'&ACCODE05&'|'&ACCODE06&'|'&ACCODE07 as _CompositeKey,
and when I load Interval I add
RecNo() as _IntervalID,
which I use in the BridgeTable to link them. Solves the problem and doesn't seem to be too bad on performance, but then again there isn't that much data.
/Ola