Skip to main content
Announcements
Qlik Connect 2025: 3 days of full immersion in data, analytics, and AI. May 13-15 | Orlando, FL: Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
dmac1971
Creator III
Creator III

Removing Duplicates at Load

Can anyone give me some direction on removing the duplicate "Line_ID" records at load using the script below?

TIA.



LOAD ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
    
ORDER_NO,
    
CUST_PO_NO,
    
ORDER_TYPE,
    
ITEM,
    
DESCRIPTION,
    
ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(
biff, embedded labels, table is $)

5 Replies
mambi
Creator III
Creator III

you can try this :

Load distinct ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
    
ORDER_NO,
    
CUST_PO_NO,
    
ORDER_TYPE,
    
ITEM,
    
DESCRIPTION,
    
ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(
biff, embedded labels, table is $)

or this :

LOAD

      ORDER_NO & '-' & LINE_NO As Line_ID,

     FirstValue(CUSTOMER) AS CUSTOMER,

     FirstValue(ORDER_NO) AS ORDER_NO, ..........

FROM [..\Sources\OpenOrdersxls]

GROUP BY Line_ID ;

dmac1971
Creator III
Creator III
Author

Can I check that the first solution only removes the duplicates where Line_ID matches?

tresesco
MVP
MVP

No, rather try like:

LOAD ORDER_NO & '-' & LINE_NO As Line_ID,
     CUSTOMER,
     ORDER_NO,
     CUST_PO_NO,
     ORDER_TYPE,
     ITEM,
     DESCRIPTION,
     ORD_QTY,
FROM

[..\Sources\OpenOrdersxls]

(biff, embedded labels, table is $) Where Not Exists (Line_ID) ;

dmac1971
Creator III
Creator III
Author

Still cant get this to work, has anyone a clear and concise example to work from maybe?  My understanding is that I firstly load a temporary table with my data including the key with the duplicates, then remove the duplicates while moving to a new table?

stig1984
Creator II
Creator II

I have tried mambisecond option:

LOAD

      ORDER_NO & '-' & LINE_NO As Line_ID,

     FirstValue(CUSTOMER) AS CUSTOMER,

     FirstValue(ORDER_NO) AS ORDER_NO, ..........

FROM [..\Sources\OpenOrdersxls]

GROUP BY Line_ID ;

and this worked for me.  Certainly worth giving it a go.

I have a data table which has multiple unique lines (QA inspection report) but will have multiple duplicate entries when I strip out the data I do not want (which I could not clear using distinct or exists script functions).

I loaded the fields I required form the original table into a TEMPQA table, then used the principle above to create one line for each unique entry - this took me original 336,287 line down to 32,603 (correct number of records without duplicates).

I then dropped the TEMPQA table.

Full script extract below

TEMPQA:

SQL SELECT CONS as QACons,

    "CONS/LINE/PALLET" as PalletUsed,

    "DEL_DET_PAL_PALLET_QTY" as Inspected_Quantity,

    concat (CONS,'/',LINE) as Consignment,

    LINE as QALINE,

    PALLET as QAPALLET,

    "QA_STATUS_DESC" as RAG

FROM "***_***".dbo."EXCEL_QA_INSPECTION_REPORT";

QAInspection:

LOAD  Distinct

  PalletUsed,

  FirstValue (Consignment) as Consignment,

  FirstValue (Inspected_Quantity) as Inspected_Quantity,

  FirstValue (RAG) as RAG

Resident TEMPQA

Group By PalletUsed;

DROP Table TEMPQA;