Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Group by syntax

Hello,

I'm trying to eliminate duplicate invoices resulting from data entry mistakes in excel and I'm using firstvalue function and group by function. I keep getting invalid expression error. can anyone look at the script and spot the mistake.

Actuals:

LOAD

[Invoice No],

[Supplier ID]&'|'&[Start Date]&'|'&[End Date] as KEY,

[Start Date],

[End Date],

[Supplier ID],

[Supplier Name],

[Destination Company],

[Item Code],

Brand,

[Invoice Date],

InYearToDate([Invoice Date],Today(),Year([Invoice Date])-Year(today())) as PINYTD,

[Invoice status],

Currency, 

[Discount amount],

[Net amount],

[Other Charges],

FirstValue([Gross amount])

Resident Actual

GROUP BY

[Invoice No],

[Supplier ID]&'|'&[Start Date]&'|'&[End Date] as KEY,

[Start Date],

[End Date],

[Supplier ID],

[Supplier Name],

[Destination Company],

[Item Code],

Brand,

[Invoice Date],

InYearToDate([Invoice Date],Today(),Year([Invoice Date])-Year(today())) as PINYTD,

[Invoice status],

Currency, 

[Discount amount],

[Net amount],

[Other Charges];

Thanks

1 Solution

Accepted Solutions
Peter_Cammaert
Partner - Champion III
Partner - Champion III

Try this:

Actuals:

NOCONCATENATE

LOAD [Invoice No]

     Max([Invoice Date]) AS [Invoice Date]

RESIDENT Actual

GROUP BY [Invoice No];

LEFT JOIN (Actuals)

LOAD * RESIDENT Actual;

Does not work however if you have multiple invoices with identical IDs on the same date. But then your original data set is weird as well (multiple invoices with the same ID on different dates?)

View solution in original post

14 Replies
swuehl
MVP
MVP

Actuals:

LOAD

[Invoice No],

[Supplier ID]&'|'&[Start Date]&'|'&[End Date] as KEY,

[Start Date],

[End Date],

[Supplier ID],

[Supplier Name],

[Destination Company],

[Item Code],

Brand,

[Invoice Date],

InYearToDate([Invoice Date],Today(),Year([Invoice Date])-Year(today())) as PINYTD,

[Invoice status],

Currency,

[Discount amount],

[Net amount],

[Other Charges],

FirstValue([Gross amount])

Resident Actual

GROUP BY

[Invoice No],

[Start Date],

[End Date],

[Supplier ID],

[Supplier Name],

[Destination Company],

[Item Code],

Brand,

[Invoice Date],

[Invoice status],

Currency,

[Discount amount],

[Net amount],

[Other Charges];

Removed the ALIASed expressions in the GROUP BY.

Not applicable
Author

Thanks for reply,

it works but the duplicates are still there. any ideas

Thanks

maksim_senin
Partner - Creator III
Partner - Creator III

Hi,

Try adding DISTINCT after LOAD, i.e. LOAD DISTINCT...

But please keep in mind this since sometimes duplicates are not evil but just a fact.

Best regards,

Maxim

swuehl
MVP
MVP

Can you give an example of these duplicate lines and what you expect instead?

A Distinct Load might be a solution (probably without the GROUP BY then), as Maxim said, have you already tried it ?

Not applicable
Author

yes, I tried distinct load but distinct does not consider the record as duplicate if only one field is matching.

example of data is as follows

           

Supplier IDSupplier NameDestination CompanyInvoice NoItem CodeBrandInvoice DateInvoice statusCurrencyGross amountDiscount amountNet amountOther Charges
123opi supplierksa1opi1opi1-Dec-14clearedUSD20002001800300
456x supplieruae2onetech1onetech1-Jan-15in progressEUR30003002700200
456x supplieruae3REVLON1REVLON1-Jan-15in progressEUR40004003600100
123opi supplierksa4opi1opi1-Dec-13clearedUSD20002001800300
456x supplieruae5onetech1onetech1-Jan-14in progressEUR30003002700200
456x supplieruae6REVLON1REVLON1-Jan-14in progressEUR40004003600100
123opi supplierksa7opi1opi1-Dec-15clearedUSD20002001800300
456x supplieruae8onetech1onetech1-Jan-16in progressEUR30003002700200
789z supplieruae9REVLON1REVLON1-Jan-16in progressEUR40004003600100
1111z supplieruae9REVLON1REVLON12-Jul-16in progressEUR10004003600100
swuehl
MVP
MVP

For this sample data, what do you expect instead?

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Your example only has DISTINCT rows.

DISTINCT works across all columns in a row, meaning that a row has to have identical values in all fields to be kicked out.

If you want to keep only one row for example per Supplier ID, use FirstValue() or FirstSortedValue() on all other fields and GROUP BY Suppleir ID.

Best,

Peter

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Which one do you want to keep? The most recent one (latest Invoice Date)?

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Try this:

Actuals:

NOCONCATENATE

LOAD [Invoice No]

     Max([Invoice Date]) AS [Invoice Date]

RESIDENT Actual

GROUP BY [Invoice No];

LEFT JOIN (Actuals)

LOAD * RESIDENT Actual;

Does not work however if you have multiple invoices with identical IDs on the same date. But then your original data set is weird as well (multiple invoices with the same ID on different dates?)