Removing Duplicates at Load

dmac1971 · ‎2014-07-02

Can anyone give me some direction on removing the duplicate "Line_ID" records at load using the script below?

TIA.

LOAD ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
     ORDER_NO,
     CUST_PO_NO,
     ORDER_TYPE,
     ITEM,
     DESCRIPTION,
     ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(biff, embedded labels, table is $)

mambi · ‎2014-07-02

you can try this :

Load distinct ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
     ORDER_NO,
     CUST_PO_NO,
     ORDER_TYPE,
     ITEM,
     DESCRIPTION,
     ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(biff, embedded labels, table is $)

or this :

LOAD

ORDER_NO & '-' & LINE_NO As Line_ID,

FirstValue(CUSTOMER) AS CUSTOMER,

FirstValue(ORDER_NO) AS ORDER_NO, ..........

FROM [..\Sources\OpenOrdersxls]

GROUP BY Line_ID ;

dmac1971 · ‎2014-07-03

Can I check that the first solution only removes the duplicates where Line_ID matches?

tresB · ‎2014-07-03

No, rather try like:

LOAD ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
     ORDER_NO,
     CUST_PO_NO,
     ORDER_TYPE,
     ITEM,
     DESCRIPTION,
     ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(biff, embedded labels, table is $) Where Not Exists (Line_ID) ;

dmac1971 · ‎2014-07-09

Still cant get this to work, has anyone a clear and concise example to work from maybe? My understanding is that I firstly load a temporary table with my data including the key with the duplicates, then remove the duplicates while moving to a new table?

stig1984 · ‎2016-08-30

I have tried mambisecond option:

LOAD

ORDER_NO & '-' & LINE_NO As Line_ID,

FirstValue(CUSTOMER) AS CUSTOMER,

FirstValue(ORDER_NO) AS ORDER_NO, ..........

FROM [..\Sources\OpenOrdersxls]

GROUP BY Line_ID ;

and this worked for me. Certainly worth giving it a go.

I have a data table which has multiple unique lines (QA inspection report) but will have multiple duplicate entries when I strip out the data I do not want (which I could not clear using distinct or exists script functions).

I loaded the fields I required form the original table into a TEMPQA table, then used the principle above to create one line for each unique entry - this took me original 336,287 line down to 32,603 (correct number of records without duplicates).

I then dropped the TEMPQA table.

Full script extract below

TEMPQA:

SQL SELECT CONS as QACons,

"CONS/LINE/PALLET" as PalletUsed,

"DEL_DET_PAL_PALLET_QTY" as Inspected_Quantity,

concat (CONS,'/',LINE) as Consignment,

LINE as QALINE,

PALLET as QAPALLET,

"QA_STATUS_DESC" as RAG

FROM "***_***".dbo."EXCEL_QA_INSPECTION_REPORT";

QAInspection:

LOAD Distinct

PalletUsed,

FirstValue (Consignment) as Consignment,

FirstValue (Inspected_Quantity) as Inspected_Quantity,

FirstValue (RAG) as RAG

Resident TEMPQA

Group By PalletUsed;

DROP Table TEMPQA;

QlikView App Dev