Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
MEBG93
Creator
Creator

Group by max date on duplicates

Hello experts,

I'm having some struggle on cleaning some duplicates. I think a group by can solve this issue but I don't know how to approach. So I've got this data set, when duplicates are considered when all but "load date" are the same.

OPERATIONS:

load * Inline[

Load Date, Operation Date, Department, Product, Value

17/01/2020, 10/01/2020, Regulatory, Snickers, 100

15/01/2020, 10/01/2020, Regulatory, Snickers, 100

13/01/2020, 10/01/2020, Regulatory, Snickers, 100

16/01/2020, 10/01/2020, Finance, Kitkat, 100

15/01/2020, 10/01/2020, Finance, Kitkat, 100

13/01/2020, 10/01/2020, Finance, Kitkat, 100

];

So the result I need should be:

Load Date, Operation Date, Department, Product, Value

17/01/2020, 10/01/2020, Regulatory, Snickers, 100

16/01/2020, 10/01/2020, Finance, Kitkat, 100

I need the max load date for each of these duplicates.

Thanks in advance

Labels (4)
1 Solution

Accepted Solutions
Kushal_Chawda

OPERATIONS:
load * Inline [
Load Date, Operation Date, Department, Product, Value
17/01/2020, 10/01/2020, Regulatory, Snickers, 100
15/01/2020, 10/01/2020, Regulatory, Snickers, 100
13/01/2020, 10/01/2020, Regulatory, Snickers, 100
16/01/2020, 10/01/2020, Finance, Kitkat, 100
15/01/2020, 10/01/2020, Finance, Kitkat, 100
13/01/2020, 10/01/2020, Finance, Kitkat, 100
];

Final:
NoConcatenate
Load max([Load Date]) as [Load Date], 
     [Operation Date], 
     Department, 
     Product, 
     Value
Resident OPERATIONS
Group by 
    [Operation Date], 
     Department, 
     Product,
     Value;
     
Drop Table OPERATIONS;

View solution in original post

3 Replies
Kushal_Chawda

OPERATIONS:
load * Inline [
Load Date, Operation Date, Department, Product, Value
17/01/2020, 10/01/2020, Regulatory, Snickers, 100
15/01/2020, 10/01/2020, Regulatory, Snickers, 100
13/01/2020, 10/01/2020, Regulatory, Snickers, 100
16/01/2020, 10/01/2020, Finance, Kitkat, 100
15/01/2020, 10/01/2020, Finance, Kitkat, 100
13/01/2020, 10/01/2020, Finance, Kitkat, 100
];

Final:
NoConcatenate
Load max([Load Date]) as [Load Date], 
     [Operation Date], 
     Department, 
     Product, 
     Value
Resident OPERATIONS
Group by 
    [Operation Date], 
     Department, 
     Product,
     Value;
     
Drop Table OPERATIONS;
MEBG93
Creator
Creator
Author

Thanks Kush, it works!. Just one thing, do I need to group by every dimension of the Final table? Is this always the case?

 

Thanks

Kushal_Chawda

Yes. You need to include all the dimensions in group by which you have included in load statement.