Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am doing a left join from file A to file B. For each unique identifier however in file B there are duplicate sales amounts. I only want to join where the sales amounts are unique for each identifier. How would I go about doing this ?
Try doing the left Join the other way.
Load *
TableA;
Left Join(A)
Load*
TableB;
Hi Sid,
Try use the synthase applyMap..
Hi Smith,
I suggest you to do like the following method:
Step1: In the table B, identify the identifiers for which we have unique amount
Step2: Do the Left Join
Detail of the script=>
TableB_TMP:
Load *
Where Nb_Amount=1; //We keep unique amount ligne
LOAD Id, count(Amount) as Nb_Amount, Sum(Amount) as Amount
From TableB
Group By Id;
TableA:
LOAD *
From TableA;
Left Join(TableA)
LOAD Id, Amount
Resident TableB_TMP;
DROP Table TableB_TMP;
Table A:
Sales_id, Sales_amount
91000, 100
91000, 50
Table B:
Sales_id, region, unit,
91000, Asia, AL
91000, Brazil, BR
91000, America, AR
91000, France, FR
91000, Spain, SP
When I load table A and then left join on table B the resultant table is 10 rows of data :
Sales_id, region, unit, Sales_amount
91000, asia, AL, 100
91000, brazil, BR, 100
91000, America, AR, 100
91000, France, FR, 100
91000, Spain, SP, 100
91000, asia, AL, 50
91000, brazil, BR, 50
91000, America, AR, 50
91000, France, FR, 50
91000, Spain, SP, 50
However this is incorrect as it is joining the Sales_id in table A to each instance in table B hence skewing the ‘Sales_amount’. Effectively the Sales_amount should equal to 150 where sales_id is 91000.
Hope this clarifies.
Hello Sid
I guess you need to Load first table as -
Load Sales_id, Sum(Sales_Amount) from Table1Group By Sales_id;
Left Join
Load Sales_id, region, unit from Table2;
This may help you to get desired output.
**Please not that sample data you have shown has only one Sales_id 91000. Accordingly it will show Sales_Amount 150 against each row.
no, I think it doesn't
I think you want
Sales_id, region, unit, Sales_amount
91000, Asia, AL
91000, Brazil, BR
91000, America, AR
91000, France, FR
91000, Spain, SP
But how to distribute 150 between 5 countries?
Random?
Thanks Prashant. I should have mentioned I have more fields than included in my original sample data. Would I need to add all of these in the group by clause as currently it returns an 'invalid expression' error.
Yes Sid,
All fields other than aggression must be listed in Group By clause.
So I have an update. There is another field in table B called quantity.
What I am being asked for is to join table A ( aggregating the Sales_amount) to table B joining on which ever single line has the maximum quantity value.
Is this even possible (?)