8 Replies Latest reply: Oct 10, 2012 9:00 AM by Rebecca Molstad

# Best way to create a generated column in a load statement based on ranges

I wish to calculate a extra column based on the value of another columen based on ranges.

Example:

TableA

Name  Price

Bob   500

Alice 234

Carl  134

Jane   40

...

...

Labels

Range                   Label

0   <= Price < 100      Low

100 <= Price < 300      Medium

300 <= Price < 1000     Quite High

...

77000 <= Price < 100000 The Rest

I want to ad an expression in my load script so that I based on the value of Price get a new column called label, so that the resulting table is like this:

TableA

Name  Price label

Bob   500  Quite High

Alice 234  Medium

Carl  134  Medium

Jane   40  Low

...

...

So basically I want do do

Name,

Price,

<something with price to map to a label> as label

FROM

.....

Any suggestions on how to do this in a clean and efficient way?

Andreas

• ###### Re: Best way to create a generated column in a load statement based on ranges

Hello Andreas,

A possible solution would be by using the IntervalMatch as below.

Hope this helps.

Ranges:

Label, MinLimit, MaxLimit
Low, 0, 99
Medium, 100, 299
Quite High, 300, 999
]
;

Data:
Name, Price
Bob, 500
Alice, 234
Carl, 134
Jane, 0
]
;

Left Join(Data) IntervalMatch (Price) LOAD MinLimit, MaxLimit Resident Ranges;

Left Join(Data)
LOAD MinLimit, MaxLimit, Label Resident Ranges;

DROP Table Ranges;

• ###### Re: Best way to create a generated column in a load statement based on ranges

This approach did indeed work. But the Data table did get two extra columns: MinLimit and MaxLimit, it is possible to avoid adding them to the resulting Data table.

Also: I fail to understand how this works. Is it possible to explain what the two Left Joins in the end actually does?

• ###### Re: Best way to create a generated column in a load statement based on ranges

The IntervalMatch used without a join, will result in an additional table in the schema.

It's used to link the two tables Ranges and Data on field Price between MinLimit and MaxLimit (the usual "join" actually joins tables based on equality of values) and so by using the first join we get the related min and max limits into table Data.

The 2nd join will add the label to table Data based on the two fields MinLimit and MaxLimit.

The result of this will be one final table instead of 2.

As to removing the 2 additional fields from the final table you can add this line in the end:

DROP Fields MinLimit, MaxLimit;

Hope I was able to explain clearly this purpose .

I'd suggest you check also the help manual for IntervalMatch and its different uses. And you can also try different scenarios so you can fully experience the outcomes (using IntervalMatch with/without join, etc...).

• ###### Re: Best way to create a generated column in a load statement based on ranges

You can use IntervalMatch to get what you are looking for.  See attached file. Hope it helps!

• ###### Re: Best way to create a generated column in a load statement based on ranges

Unfortunately I can't open the file right now. Is it the same approach as the one from jp_bakhache?

• ###### Re: Best way to create a generated column in a load statement based on ranges

No problem.  My suggestion was similar but I dropped the mins and maxes, so you only show the labels.

Data:

Name, Price

Bob, 500

Alice, 234

Carl, 134

Jane, 40

]
;

Ranges:

Start, Stop, Label

0, 99, Low

100, 299, Medium

300, 1000, Quite High

]
;

IntervalMatch(Price)

LEFT JOIN (Ranges)

DROP Fields Start, Stop;

• ###### Re: Best way to create a generated column in a load statement based on ranges

Here's another approach using MAPPING:

Ranges:

Start+IterNo()-1,

Label

while Start+IterNo()-1 <= Stop

;

Start, Stop, Label

0, 99, Low

100, 299, Medium

300, 1000, Quite High

];

Map Label using Ranges;

Data:

LOAD *, Price as Label INLINE [

Name, Price

Bob, 500

Alice, 234

Carl, 134

Jane, 40

];

-Rob

http://robwunderlich.com

• ###### Re: Best way to create a generated column in a load statement based on ranges

Unfortunately my real world scenario includes values up to a billion, which would create a Ranges table with a billion rows, which of course is unfeasable. Both from a time and memory perspective.