Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I need to get the Top 2 Records from a Field for each Category.
Say the following is my Input data.
And I need to get the Top 2 (Maximum) from each Category, which is A should have 8 and 5 while B should have 9 and 8.
How should I use the Max() function in the Load Script? From what I have searched, I found that Max(Value, 2) should give the solution. But it is not working. I am getting the following output instead.
What is the best way to achieve this is in LOAD SCRIPT?
Try like:
temp:
Load * Inline [
Category, Value
A, 1
A, 9
A, 2
A, 4
A, 7
B, 11
B, 15
B, 10
B, 14
];
NoConcatenate
Final:
Load
*,
AutoNumber(Value, Category)
Resident temp Where AutoNumber(Value, Category)<3 Order By Category, Value desc;
DROP Table temp;
FIRSTSORTEDVALUE() should help here,
aircode might be
TopRanks:
LOAD
Category,
FIRSTSORTEDVALUE(Category, Value, 1) AS TopValue
RESIDENT
MyTable
GROUP BY
Category;
CONCATENATE (TopRanks) LOAD
Category,
FIRSTSORTEDVALUE(Category, Value, 2) AS TopValue
RESIDENT
MyTable
GROUP BY
Category;
Peter
Try like:
temp:
Load * Inline [
Category, Value
A, 1
A, 9
A, 2
A, 4
A, 7
B, 11
B, 15
B, 10
B, 14
];
NoConcatenate
Final:
Load
*,
AutoNumber(Value, Category)
Resident temp Where AutoNumber(Value, Category)<3 Order By Category, Value desc;
DROP Table temp;
You can try this as well with MAX() functions
TempTab:
Load
Category,
Value
From Source;
MainTab:
LOAD
Category,
Max(Value) AS MaxValue
RESIDENT TempTab
GROUP BY Category;
Concatenate(MainTab)
LOAD
Category,
Max(Value,2) AS MaxValue
RESIDENT TempTab
GROUP BY Category;
Drop Table TempTab;
So if it is top N, you have to just change here:
Where AutoNumber(Value, Category)<=N Order By Category, Value desc;
Thanks a lot, Anand. Understood that there is no single straight way to achieve the output. tresesco Solution worked in one Go. Still not sure about the Running Time when used with bigger Larger Data.
Hi Koushik,
It's better to avoid group by in the load script as it will consume more time while loading the data.
Try to do it in the visual layer.
=If(Rank(Sum(Distinct(Value)))<=2,Value,null())
Many Thanks
Karthik
Koushik, Thanks for the information but if you are working on the Load Script and using JOINs for some tables is good because JOINs take less lime to process in comparison to Mappings Or Group By Or Order By they take time to arrange the table with its all fields and grouped the data for those fields. It is preferred if you have large table go for the Joins rather than Group By but it is still the trick to do the modelling on the tables.
Hope this helps you.
Also please check is this your Out put from the table
Out put that you preferred.
A should have 8 and 5
B should have 9 and 8.