Solved: Re: Best way to create a generated column in a loa... - Qlik Community

Report Inappropriate Content · ‎2012-10-09

I wish to calculate a extra column based on the value of another columen based on ranges.

Example:

TableA

Name Price

Bob 500

Alice 234

Carl 134

Jane 40

...

Labels

Range Label

0 <= Price < 100 Low

100 <= Price < 300 Medium

300 <= Price < 1000 Quite High

...

77000 <= Price < 100000 The Rest

I want to ad an expression in my load script so that I based on the value of Price get a new column called label, so that the resulting table is like this:

TableA

Name Price label

Bob 500 Quite High

Alice 234 Medium

Carl 134 Medium

Jane 40 Low

...

So basically I want do do

LOAD

Name,

Price,

<something with price to map to a label> as label

FROM

.....

Any suggestions on how to do this in a clean and efficient way?

Thanks in advance.

Andreas

Report Inappropriate Content · ‎2012-10-09

Hello Andreas,

A possible solution would be by using the IntervalMatch as below.

Hope this helps.

Ranges:

LOAD * Inline [
Label, MinLimit, MaxLimit
Low, 0, 99
Medium, 100, 299
Quite High, 300, 999
];

Data:
LOAD * Inline [
Name, Price
Bob, 500
Alice, 234
Carl, 134
Jane, 0
];

Left Join(Data) IntervalMatch (Price) LOAD MinLimit, MaxLimit Resident Ranges;

Left Join(Data)
LOAD MinLimit, MaxLimit, Label Resident Ranges;

DROP Table Ranges;

View solution in original post

Report Inappropriate Content · ‎2012-10-09

Hello Andreas,

A possible solution would be by using the IntervalMatch as below.

Hope this helps.

Ranges:

LOAD * Inline [
Label, MinLimit, MaxLimit
Low, 0, 99
Medium, 100, 299
Quite High, 300, 999
];

Data:
LOAD * Inline [
Name, Price
Bob, 500
Alice, 234
Carl, 134
Jane, 0
];

Left Join(Data) IntervalMatch (Price) LOAD MinLimit, MaxLimit Resident Ranges;

Left Join(Data)
LOAD MinLimit, MaxLimit, Label Resident Ranges;

DROP Table Ranges;

Report Inappropriate Content · ‎2012-10-09

You can use IntervalMatch to get what you are looking for. See attached file. Hope it helps!

rwunderlich · ‎2012-10-09

Here's another approach using MAPPING:

Ranges:

MAPPING LOAD

Start+IterNo()-1,

Label

while Start+IterNo()-1 <= Stop

;

LOAD * INLINE [

Start, Stop, Label

0, 99, Low

100, 299, Medium

300, 1000, Quite High

];

Map Label using Ranges;

Data:

LOAD *, Price as Label INLINE [

Name, Price

Bob, 500

Alice, 234

Carl, 134

Jane, 40

];

-Rob

http://robwunderlich.com

Report Inappropriate Content · ‎2012-10-10

Thank you for your reply jp_bakhache.

This approach did indeed work. But the Data table did get two extra columns: MinLimit and MaxLimit, it is possible to avoid adding them to the resulting Data table.

Also: I fail to understand how this works. Is it possible to explain what the two Left Joins in the end actually does?

Report Inappropriate Content · ‎2012-10-10

Thank you for your reply rebeccad.

Unfortunately I can't open the file right now. Is it the same approach as the one from jp_bakhache?

Report Inappropriate Content · ‎2012-10-10

Thak you Rob for your reply.

Unfortunately my real world scenario includes values up to a billion, which would create a Ranges table with a billion rows, which of course is unfeasable. Both from a time and memory perspective.

Report Inappropriate Content · ‎2012-10-10

The IntervalMatch used without a join, will result in an additional table in the schema.

It's used to link the two tables Ranges and Data on field Price between MinLimit and MaxLimit (the usual "join" actually joins tables based on equality of values) and so by using the first join we get the related min and max limits into table Data.

The 2nd join will add the label to table Data based on the two fields MinLimit and MaxLimit.

The result of this will be one final table instead of 2.

As to removing the 2 additional fields from the final table you can add this line in the end:

DROP Fields MinLimit, MaxLimit;

Hope I was able to explain clearly this purpose .

I'd suggest you check also the help manual for IntervalMatch and its different uses. And you can also try different scenarios so you can fully experience the outcomes (using IntervalMatch with/without join, etc...).

Report Inappropriate Content · ‎2012-10-10

No problem. My suggestion was similar but I dropped the mins and maxes, so you only show the labels.

Data:

LOAD * INLINE [

    Name, Price

    Bob, 500

    Alice, 234

    Carl, 134

    Jane, 40

];

Ranges:
LOAD * INLINE [

    Start, Stop, Label

    0, 99, Low

    100, 299, Medium

    300, 1000, Quite High

];

IntervalMatch(Price)

LEFT JOIN (Ranges)

LOAD Start, Stop Resident Ranges;

DROP Fields Start, Stop;

Best way to create a generated column in a load statement based on ranges