Hi,
The relevant data attributes include three strings (name, type, subtype) as well as numeric field (value). There are actually 15 fields in all. For each NAME there are more than 1 TYPE. The tasks described above only apply to the TYPE = "Country".
The input is something like:
NAME TYPE SUBTYPE VALUE
ProductA Country France 0.9
ProductA Group GroupA 0.75
ProductA Country Canada 0.05
ProductA Country United States 0.05
ProductA Group GroupC 0.25
The output would be something like:
NAME TYPE SUBTYPE VALUE
ProductA Country France 0.9
ProductA Group GroupA 0.75
ProductA Country North America 0.1
ProductA Group GroupC 0.25
This data example is fairly small though there could be any number of countries.
The desired output dataset contains 10 records per NAME & TYPE=?Country? and possibly an 11th record with the subtype "Other" if necessary.
The consolidation rules have a few extra pieces above and beyond what has already been described as well but I'll stick with the part that was already posted.
The requirement was to have as little as possible in "Other" so two possible consolidations have been chosen. Overall it is desirable to have as few consolidations as is possible.
So for all products that have >10 countries we need to count how many countries fall into each of the two aggregations. Would one aggregation be enough to get it to 10 or less? If it does then do the one only else do the second as well. If the second is applied we could still be over 10 so then the smallest (ranked by VALUE) N records past 10 should be aggregated to "Other".
One thing of note is that if either one of the two consolodations would get the total number <=10 then use the one with the least number of Countries in it (i.e. that maximizes the number of rows (<=10)) should be chosen.
During this whole process TYPES other than "Country" are ignored but need to remain in the final output.
Is that enough information? Please let me know what else you need.
Many Thanks,
Jay