Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have an Age field and am trying to create an Age Group field based around it.
The Age group is
under 16
16-35
36 - 55
56 - 65
Over 65
Its best to create the group in the load script and then use this new field called Age_Group.
Try the below example
LOAD
if(age > 65, 'Over 65',
if(age <= 65 and age >= 56, '56 - 65',
if(age <= 55 and age >= 36, '36 - 55',
if(age <= 35 and age >= 16, '16 - 35',
if(age < 16, 'Under 16'))))) as Age_Group;
SQL Select *
FROM Table1;
That's pretty much what I'd do, but I'd take advantage of knowing we failed the previous IFs to simplify the expression a bit for performance:
if(age > 65, 'Over 65',
if(age >= 56, '56 - 65',
if(age >= 36, '36 - 55',
if(age >= 16, '16 - 35',
'Under 16')))) as Age_Group
If performance is an issue, you might also want to see if a mapping table is faster. I think it would look like this:
AgeGroups:
MAPPING LOAD
Age
,if(Age > 65, 'Over 65',
,if(Age >= 56, '56 - 65',
,if(Age >= 36, '36 - 55',
,if(Age >= 16, '16 - 35',
'Under 16')))) as AgeGroup
;
LOAD recno() as Age
AUTOGENERATE 200
;
LOAD *
,applymap('AgeGroups',Age) as AgeGroup
;
SQL SELECT *
FROM Table1
;
The idea is that you save time by only performing the IF once per age, not once per row. Or for that matter, just leave AgeGroups as a separate table linked to your main table by Age. That will probably load fastest, but may be slightly slower to display in charts. Just guessing, though.
I'm trying to implement your previous suggestion for my numeric field called Age which is located in a table I call Data. (The Data table I created from a .txt file.) I get the following error message:
ODBC connection failed
SQL Select *
FROM Data
I'm guessing that I can't use the code SQL Select * but I'm not sure what to substitute.
Any help would be appreciated!
The example I gave was merely an example. If you're loading from a text file instead of from an ODBC data source that uses SQL, you don't want to use an SQL select. You want to load from your text file the way you normally do.
AgeGroups:
MAPPING LOAD
Age
,if(Age > 65, 'Over 65',
,if(Age >= 56, '56 - 65',
,if(Age >= 36, '36 - 55',
,if(Age >= 16, '16 - 35',
'Under 16')))) as AgeGroup
;
LOAD recno() as Age
AUTOGENERATE 200
;
[Your table name goes here]:
LOAD *
,applymap('AgeGroups',Age) as AgeGroup
;
The same script you've always used to load the fields from your text file goes here.
Here's an alternative way to create the mapping table that avoids the Ifs (and paren-disease).
AgeGroups:
MAPPING LOAD
Low+(iterNo()-1) as Key,
Group
WHILE iterNo() <= (High - Low) + 1
;
LOAD * INLINE [
Low, High, Group
0,15,Under 16
16,35,16-35
36,55,36-55
56,65,56-65
]
;
TestData:
LOAD *,
applyMap('AgeGroups', Age, 'Over 65') as AgeGroup
;
LOAD * INLINE [
Age
1
3
15
16
30
55
99
]
;
Not a huge improvement, but may make it easier to maintain the range table.
-Rob
Yet another calculated solution:
if(Age<16, 'Under 16',
if(Age>65, 'Over 65',
text(16+(Div(Age-16,20)*20)) & ' - ' & text(35+(Div(Age-16,20)*20)))) as AgeGroup
But you always have to deal with the fringe groups..
- Ralf
The final touch is to add dual() function so that the groups can be sorted properly.
I think the fringe groups is not a drawback as it's often perceived by users this way even if it's not.
Thanx Rob, it is a big improvement for me. I have to handle 30+millions of records. Doing it with map instead of IF is faster.
Hi, I was wondering what's the best way to group customers by the Asset Value without using if condition
Cust ID Asset Value Segment
1 < 1million
1 1-2 million
If asset value is coming from another table other than the customer and it is split across months so need needs to be summed up