There is an additional complication. Corresponding to each vendors there are multiple transactions. So consider the following situation:
Rate # of transactions
USA 12 10
UK 13 20
Australia 14 61
Your solution will return 14 as the median value since Rate corresponding to 45th/46th transaction is 14. However, I want the median value as 13.
That still does not seem to solve my problem. I think I should rephrase my issue and mention the data fields present in the fact so that it becomes simpler.
The input file has following format:
Company Geography Rate Hours
IBM USA 11 1
IBM USA 12 1
IBM USA 13 1
I calculate the rate for "Company-Geography" combination at load time by grouping by "Company-Geography" combination. ( I have a "Company-Geography" concatenated field in the input file). Lets call this effective rate.
What I want to do is for a given company, I want to find the geography corresponding to median rates. There are multiple transactions for "Company-Geography" combination and effective rate can be same for any two "Company-Geography" combination (which is one of the reasons why i am concerned about using Distinct).