Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
dmac1971
Creator III
Creator III

Removing Duplicates at Load

Can anyone give me some direction on removing the duplicate "Line_ID" records at load using the script below?

TIA.



LOAD ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
    
ORDER_NO,
    
CUST_PO_NO,
    
ORDER_TYPE,
    
ITEM,
    
DESCRIPTION,
    
ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(
biff, embedded labels, table is $)

5 Replies
mambi
Creator III
Creator III

you can try this :

Load distinct ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
    
ORDER_NO,
    
CUST_PO_NO,
    
ORDER_TYPE,
    
ITEM,
    
DESCRIPTION,
    
ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(
biff, embedded labels, table is $)

or this :

LOAD

      ORDER_NO & '-' & LINE_NO As Line_ID,

     FirstValue(CUSTOMER) AS CUSTOMER,

     FirstValue(ORDER_NO) AS ORDER_NO, ..........

FROM [..\Sources\OpenOrdersxls]

GROUP BY Line_ID ;

dmac1971
Creator III
Creator III
Author

Can I check that the first solution only removes the duplicates where Line_ID matches?

tresesco
MVP
MVP

No, rather try like:

LOAD ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
     ORDER_NO,
     CUST_PO_NO,
     ORDER_TYPE,
     ITEM,
     DESCRIPTION,
     ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(biff, embedded labels, table is $) Where Not Exists (Line_ID) ;

dmac1971
Creator III
Creator III
Author

Still cant get this to work, has anyone a clear and concise example to work from maybe?  My understanding is that I firstly load a temporary table with my data including the key with the duplicates, then remove the duplicates while moving to a new table?

stig1984
Creator II
Creator II

I have tried mambisecond option:

LOAD

      ORDER_NO & '-' & LINE_NO As Line_ID,

     FirstValue(CUSTOMER) AS CUSTOMER,

     FirstValue(ORDER_NO) AS ORDER_NO, ..........

FROM [..\Sources\OpenOrdersxls]

GROUP BY Line_ID ;

and this worked for me.  Certainly worth giving it a go.

I have a data table which has multiple unique lines (QA inspection report) but will have multiple duplicate entries when I strip out the data I do not want (which I could not clear using distinct or exists script functions).

I loaded the fields I required form the original table into a TEMPQA table, then used the principle above to create one line for each unique entry - this took me original 336,287 line down to 32,603 (correct number of records without duplicates).

I then dropped the TEMPQA table.

Full script extract below

TEMPQA:

SQL SELECT CONS as QACons,

    "CONS/LINE/PALLET" as PalletUsed,

    "DEL_DET_PAL_PALLET_QTY" as Inspected_Quantity,

    concat (CONS,'/',LINE) as Consignment,

    LINE as QALINE,

    PALLET as QAPALLET,

    "QA_STATUS_DESC" as RAG

FROM "***_***".dbo."EXCEL_QA_INSPECTION_REPORT";

QAInspection:

LOAD  Distinct

  PalletUsed,

  FirstValue (Consignment) as Consignment,

  FirstValue (Inspected_Quantity) as Inspected_Quantity,

  FirstValue (RAG) as RAG

Resident TEMPQA

Group By PalletUsed;

DROP Table TEMPQA;