Discussion Board for collaboration related to QlikView App Development.
Hello,
I'm trying to eliminate duplicate invoices resulting from data entry mistakes in excel and I'm using firstvalue function and group by function. I keep getting invalid expression error. can anyone look at the script and spot the mistake.
Actuals:
LOAD
[Invoice No],
[Supplier ID]&'|'&[Start Date]&'|'&[End Date] as KEY,
[Start Date],
[End Date],
[Supplier ID],
[Supplier Name],
[Destination Company],
[Item Code],
Brand,
[Invoice Date],
InYearToDate([Invoice Date],Today(),Year([Invoice Date])-Year(today())) as PINYTD,
[Invoice status],
Currency,
[Discount amount],
[Net amount],
[Other Charges],
FirstValue([Gross amount])
Resident Actual
GROUP BY
[Invoice No],
[Supplier ID]&'|'&[Start Date]&'|'&[End Date] as KEY,
[Start Date],
[End Date],
[Supplier ID],
[Supplier Name],
[Destination Company],
[Item Code],
Brand,
[Invoice Date],
InYearToDate([Invoice Date],Today(),Year([Invoice Date])-Year(today())) as PINYTD,
[Invoice status],
Currency,
[Discount amount],
[Net amount],
[Other Charges];
Thanks
Try this:
Actuals:
NOCONCATENATE
LOAD [Invoice No]
Max([Invoice Date]) AS [Invoice Date]
RESIDENT Actual
GROUP BY [Invoice No];
LEFT JOIN (Actuals)
LOAD * RESIDENT Actual;
Does not work however if you have multiple invoices with identical IDs on the same date. But then your original data set is weird as well (multiple invoices with the same ID on different dates?)
Actuals:
LOAD
[Invoice No],
[Supplier ID]&'|'&[Start Date]&'|'&[End Date] as KEY,
[Start Date],
[End Date],
[Supplier ID],
[Supplier Name],
[Destination Company],
[Item Code],
Brand,
[Invoice Date],
InYearToDate([Invoice Date],Today(),Year([Invoice Date])-Year(today())) as PINYTD,
[Invoice status],
Currency,
[Discount amount],
[Net amount],
[Other Charges],
FirstValue([Gross amount])
Resident Actual
GROUP BY
[Invoice No],
[Start Date],
[End Date],
[Supplier ID],
[Supplier Name],
[Destination Company],
[Item Code],
Brand,
[Invoice Date],
[Invoice status],
Currency,
[Discount amount],
[Net amount],
[Other Charges];
Removed the ALIASed expressions in the GROUP BY.
Thanks for reply,
it works but the duplicates are still there. any ideas
Thanks
Hi,
Try adding DISTINCT after LOAD, i.e. LOAD DISTINCT...
But please keep in mind this since sometimes duplicates are not evil but just a fact.
Best regards,
Maxim
Can you give an example of these duplicate lines and what you expect instead?
A Distinct Load might be a solution (probably without the GROUP BY then), as Maxim said, have you already tried it ?
yes, I tried distinct load but distinct does not consider the record as duplicate if only one field is matching.
example of data is as follows
Supplier ID | Supplier Name | Destination Company | Invoice No | Item Code | Brand | Invoice Date | Invoice status | Currency | Gross amount | Discount amount | Net amount | Other Charges |
123 | opi supplier | ksa | 1 | opi1 | opi | 1-Dec-14 | cleared | USD | 2000 | 200 | 1800 | 300 |
456 | x supplier | uae | 2 | onetech1 | onetech | 1-Jan-15 | in progress | EUR | 3000 | 300 | 2700 | 200 |
456 | x supplier | uae | 3 | REVLON1 | REVLON | 1-Jan-15 | in progress | EUR | 4000 | 400 | 3600 | 100 |
123 | opi supplier | ksa | 4 | opi1 | opi | 1-Dec-13 | cleared | USD | 2000 | 200 | 1800 | 300 |
456 | x supplier | uae | 5 | onetech1 | onetech | 1-Jan-14 | in progress | EUR | 3000 | 300 | 2700 | 200 |
456 | x supplier | uae | 6 | REVLON1 | REVLON | 1-Jan-14 | in progress | EUR | 4000 | 400 | 3600 | 100 |
123 | opi supplier | ksa | 7 | opi1 | opi | 1-Dec-15 | cleared | USD | 2000 | 200 | 1800 | 300 |
456 | x supplier | uae | 8 | onetech1 | onetech | 1-Jan-16 | in progress | EUR | 3000 | 300 | 2700 | 200 |
789 | z supplier | uae | 9 | REVLON1 | REVLON | 1-Jan-16 | in progress | EUR | 4000 | 400 | 3600 | 100 |
1111 | z supplier | uae | 9 | REVLON1 | REVLON | 12-Jul-16 | in progress | EUR | 1000 | 400 | 3600 | 100 |
For this sample data, what do you expect instead?
Your example only has DISTINCT rows.
DISTINCT works across all columns in a row, meaning that a row has to have identical values in all fields to be kicked out.
If you want to keep only one row for example per Supplier ID, use FirstValue() or FirstSortedValue() on all other fields and GROUP BY Suppleir ID.
Best,
Peter
Which one do you want to keep? The most recent one (latest Invoice Date)?
Try this:
Actuals:
NOCONCATENATE
LOAD [Invoice No]
Max([Invoice Date]) AS [Invoice Date]
RESIDENT Actual
GROUP BY [Invoice No];
LEFT JOIN (Actuals)
LOAD * RESIDENT Actual;
Does not work however if you have multiple invoices with identical IDs on the same date. But then your original data set is weird as well (multiple invoices with the same ID on different dates?)