Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I wish to calculate a extra column based on the value of another columen based on ranges.
Example:
TableA
Name Price
Bob 500
Alice 234
Carl 134
Jane 40
...
...
Labels
Range Label
0 <= Price < 100 Low
100 <= Price < 300 Medium
300 <= Price < 1000 Quite High
...
77000 <= Price < 100000 The Rest
I want to ad an expression in my load script so that I based on the value of Price get a new column called label, so that the resulting table is like this:
TableA
Name Price label
Bob 500 Quite High
Alice 234 Medium
Carl 134 Medium
Jane 40 Low
...
...
So basically I want do do
LOAD
Name,
Price,
<something with price to map to a label> as label
FROM
.....
Any suggestions on how to do this in a clean and efficient way?
Thanks in advance.
Andreas
Hello Andreas,
A possible solution would be by using the IntervalMatch as below.
Hope this helps.
Ranges:
LOAD * Inline [
Label, MinLimit, MaxLimit
Low, 0, 99
Medium, 100, 299
Quite High, 300, 999
];
Data:
LOAD * Inline [
Name, Price
Bob, 500
Alice, 234
Carl, 134
Jane, 0
];
Left Join(Data) IntervalMatch (Price) LOAD MinLimit, MaxLimit Resident Ranges;
Left Join(Data)
LOAD MinLimit, MaxLimit, Label Resident Ranges;
DROP Table Ranges;
Hello Andreas,
A possible solution would be by using the IntervalMatch as below.
Hope this helps.
Ranges:
LOAD * Inline [
Label, MinLimit, MaxLimit
Low, 0, 99
Medium, 100, 299
Quite High, 300, 999
];
Data:
LOAD * Inline [
Name, Price
Bob, 500
Alice, 234
Carl, 134
Jane, 0
];
Left Join(Data) IntervalMatch (Price) LOAD MinLimit, MaxLimit Resident Ranges;
Left Join(Data)
LOAD MinLimit, MaxLimit, Label Resident Ranges;
DROP Table Ranges;
You can use IntervalMatch to get what you are looking for. See attached file. Hope it helps!
Here's another approach using MAPPING:
Ranges:
MAPPING LOAD
Start+IterNo()-1,
Label
while Start+IterNo()-1 <= Stop
;
LOAD * INLINE [
Start, Stop, Label
0, 99, Low
100, 299, Medium
300, 1000, Quite High
];
Map Label using Ranges;
Data:
LOAD *, Price as Label INLINE [
Name, Price
Bob, 500
Alice, 234
Carl, 134
Jane, 40
];
-Rob
Thank you for your reply jp_bakhache.
This approach did indeed work. But the Data table did get two extra columns: MinLimit and MaxLimit, it is possible to avoid adding them to the resulting Data table.
Also: I fail to understand how this works. Is it possible to explain what the two Left Joins in the end actually does?
Thank you for your reply rebeccad.
Unfortunately I can't open the file right now. Is it the same approach as the one from jp_bakhache?
Thak you Rob for your reply.
Unfortunately my real world scenario includes values up to a billion, which would create a Ranges table with a billion rows, which of course is unfeasable. Both from a time and memory perspective.
The IntervalMatch used without a join, will result in an additional table in the schema.
It's used to link the two tables Ranges and Data on field Price between MinLimit and MaxLimit (the usual "join" actually joins tables based on equality of values) and so by using the first join we get the related min and max limits into table Data.
The 2nd join will add the label to table Data based on the two fields MinLimit and MaxLimit.
The result of this will be one final table instead of 2.
As to removing the 2 additional fields from the final table you can add this line in the end:
DROP Fields MinLimit, MaxLimit;
Hope I was able to explain clearly this purpose .
I'd suggest you check also the help manual for IntervalMatch and its different uses. And you can also try different scenarios so you can fully experience the outcomes (using IntervalMatch with/without join, etc...).
No problem. My suggestion was similar but I dropped the mins and maxes, so you only show the labels.
Data:
LOAD * INLINE [
Name, Price
Bob, 500
Alice, 234
Carl, 134
Jane, 40
];
Ranges:
LOAD * INLINE [
Start, Stop, Label
0, 99, Low
100, 299, Medium
300, 1000, Quite High
];
IntervalMatch(Price)
LEFT JOIN (Ranges)
LOAD Start, Stop Resident Ranges;
DROP Fields Start, Stop;