Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Guys,
I need little help.
Please look at the excel file
There are four columns that have a that are in some cases exactly the same:
The colums are:
invoiceID
supplierID
date
cost
IF the records are excactly the same then 1 else 0.
By doing that I can quickly identify wich invoices have been booked double or more.
I tried the previous function, but that did work really wel.
I made an composite key of those four keys. and doing a count on the key. But I dont know whether, a count is going to be memory consuming.
Please look at the attach file. The marked yelllow records are the records I would like to see in a straight table
Id | invoice | supplierId | date | cost |
10 | 147 | 456 | 1-1-2015 | 50 |
11 | 147 | 456 | 1-1-2015 | 100 |
12 | 147 | 455 | 1-1-2015 | 100 |
13 | 148 | 456 | 2-1-2015 | 50 |
14 | 149 | 457 | 2-1-2015 | 47 |
15 | 150 | 458 | 1-1-2015 | 150 |
16 | 150 | 458 | 1-1-2015 | 150 |
17 | 150 | 458 | 1-1-2015 | 150 |
18 | 151 | 459 | 2-1-2015 | 11 |
19 | 152 | 459 | 2-1-2015 | 11 |
Data:
Load
AutoNumber(invoice & supplierId & date & cost) as AutoID
,* Inline
[
Id, invoice,supplierId, date, cost
10, 147, 456, 01/01/2015, 50
11, 147, 456, 01/01/2015, 100
12, 147, 456, 01/01/2015, 100
13, 148, 456, 02/01/2015, 50
14, 149, 457, 02/01/2015, 47
15, 150, 458, 01/01/2015, 150
16, 150, 458, 01/01/2015, 150
17, 150, 458, 01/01/2015, 150
18, 151, 459, 02/01/2015, 11
19, 152, 459, 02/01/2015, 11
20, 153, 456, 01/01/2015, 100
21, 153, 456, 01/01/2015, 100
22, 153, 456, 01/01/2015, 20
];
Left Join
Load AutoID, Count(AutoID) as TotalAutoID Resident Data Group By AutoID;
Final:
Load Id, invoice, supplierId, date, cost, If(TotalAutoID > 1, 1, 0) as Flag_Duplicate Resident Data;
Drop Table Data;
What you want to do as a result? Remove those duplicate records or only identify them?
Data:
Load
AutoNumber(invoice & supplierId & date & cost) as AutoID
,* Inline
[
Id, invoice,supplierId, date, cost
10, 147, 456, 01/01/2015, 50
11, 147, 456, 01/01/2015, 100
12, 147, 456, 01/01/2015, 100
13, 148, 456, 02/01/2015, 50
14, 149, 457, 02/01/2015, 47
15, 150, 458, 01/01/2015, 150
16, 150, 458, 01/01/2015, 150
17, 150, 458, 01/01/2015, 150
18, 151, 459, 02/01/2015, 11
19, 152, 459, 02/01/2015, 11
20, 153, 456, 01/01/2015, 100
21, 153, 456, 01/01/2015, 100
22, 153, 456, 01/01/2015, 20
];
Left Join
Load AutoID, Count(AutoID) as TotalAutoID Resident Data Group By AutoID;
Final:
Load Id, invoice, supplierId, date, cost, If(TotalAutoID > 1, 1, 0) as Flag_Duplicate Resident Data;
Drop Table Data;
only identify them for business purpose only.
A:
LOAD Id, invoice, supplierId, date, cost
FROM tmp3.xlsx (ooxml, embedded labels, table is Blad1);
Left Join (A)
load *, if(cnt>1,1,0) as Duplicated;
load
invoice, supplierId, date, cost,
count(Id) as cnt
Resident A
Group By invoice, supplierId, date, cost;