Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I need help to identify the row with the highest value from a excel list containing some duplicated "postnr".
The table below shows a sample from the excel list of swedish zip codes.
| postnr | postort | länskod | län | kommunkod | kommun | A-region | H-region | H-region info |
| 15593 | NYKVARN | 1 | STOCKHOLM | 140 | NYKVARN | 1 | H1 | Stockholm/Södertälje A-region |
| 15593 | NYKVARN | 4 | SÖDERMANLAND | 486 | STRÄNGNÄS | 7 | H4 | Mellanbygden |
| 15594 | NYKVARN | 1 | STOCKHOLM | 140 | NYKVARN | 1 | H1 | Stockholm/Södertälje A-region |
| 15594 | NYKVARN | 4 | SÖDERMANLAND | 461 | GNESTA | 5 | H3 | Större städer |
The column postnr is joined with the customer table identifying how many customers lives in a certain zip code, and what sort of H-region that zip code has.
As you can see from the list above one unique postnr could have many H-regions.
What I want is to - in the loading script - only load unique postnr with the lowest H-region (going from H1 -> H9).
Thanks ahead!
Hi,
You don't need to modify you data in the loading script. You can do that by creating a pivot table having postnr and H-region as dimensions. Create an expression having the of items per H-region count(fieldname).
Regards.
First load your excel file without any filtering. Then:
INNER JOIN (mytab)
LOAD postnr, minstring([H-region]) as [H-region]
RESIDENT mytab;
That will remove all but the lowest value h-region for each postnr.
-Rob
If you want to use script, the following are the steps (taking into account the solutions suggested by Rob and wgonzalez):
Data:
load postnr,postort,lanskod,lan,kommunkod,kommun,A_region,H_Region,H_region_info from (your source);
Data2:
load postnr,Min(H_region) as Min_H_Region resident Data group by postnr;
left join (Data) load * resident Data2;
drop table Data2;
Final:
Noconcatenate load * resident Data where H_Region = Min_H_Region;
drop table Data;
drop field Min_H_Region from Data2;
Krishna,
Are the extra steps needed? It seems to me that just the inner join is all that's necessary.
-Rob
Rob,
Thank you. I did not notice the INNER JOIN. Actually your script works and there is no need for the extra steps in my post and i verified it. The following script gives the desired result:
mytab:
load * Inline [
postnr,postort,länskod,län,kommunkod,kommun,A-region,H-region,"H-region info"
15593,NYKVARN,1,STOCKHOLM,140,NYKVARN,1,H1,Stockholm/Södertälje A-region
15593,NYKVARN,4,SÖDERMANLAND,486,STRÄNGNÄS,7,H4,Mellanbygden
15594,NYKVARN,1,STOCKHOLM,140,NYKVARN,1,H1,Stockholm/Södertälje A-region
15594,NYKVARN,4,SÖDERMANLAND,461,GNESTA,5,H3,Större städer
];
INNER JOIN (mytab)
LOAD postnr, minstring([H-region]) as [H-region] RESIDENT mytab Group By postnr;
Thanks for the confirmation. I thought I may have been missing something.
-Rob