5 Replies Latest reply: Aug 30, 2016 6:10 AM by Stephen Tyrer RSS

    Removing Duplicates at Load

    Dermot mcgoldrick

      Can anyone give me some direction on removing the duplicate "Line_ID" records at load using the script below?

       

      TIA.



      LOAD ORDER_NO & '-' & LINE_NO As Line_ID,
           CUSTOMER,
          
      ORDER_NO,
          
      CUST_PO_NO,
          
      ORDER_TYPE,
          
      ITEM,
          
      DESCRIPTION,
          
      ORD_QTY,
      FROM

      [..\Sources\OpenOrdersxls]

      (
      biff, embedded labels, table is $)

       

        • Re: Removing Duplicates at Load
          Mambi Badi

          you can try this :

          Load distinct ORDER_NO & '-' & LINE_NO As Line_ID,
               CUSTOMER,
              
          ORDER_NO,
              
          CUST_PO_NO,
              
          ORDER_TYPE,
              
          ITEM,
              
          DESCRIPTION,
              
          ORD_QTY,
          FROM

          [..\Sources\OpenOrdersxls]

          (
          biff, embedded labels, table is $)

           

          or this :

           

          LOAD

                ORDER_NO & '-' & LINE_NO As Line_ID,

               FirstValue(CUSTOMER) AS CUSTOMER,

               FirstValue(ORDER_NO) AS ORDER_NO, ..........

          FROM [..\Sources\OpenOrdersxls]

          GROUP BY Line_ID ;

          • Re: Removing Duplicates at Load
            Dermot mcgoldrick

            Can I check that the first solution only removes the duplicates where Line_ID matches?

              • Re: Removing Duplicates at Load
                Tresesco B

                No, rather try like:

                 

                LOAD ORDER_NO & '-' & LINE_NO As Line_ID,
                     CUSTOMER,
                     ORDER_NO,
                     CUST_PO_NO,
                     ORDER_TYPE,
                     ITEM,
                     DESCRIPTION,
                     ORD_QTY,
                FROM

                [..\Sources\OpenOrdersxls]

                (biff, embedded labels, table is $) Where Not Exists (Line_ID) ;

                  • Re: Removing Duplicates at Load
                    Dermot mcgoldrick

                    Still cant get this to work, has anyone a clear and concise example to work from maybe?  My understanding is that I firstly load a temporary table with my data including the key with the duplicates, then remove the duplicates while moving to a new table?

                      • Re: Removing Duplicates at Load
                        Stephen Tyrer

                        I have tried mambi's second option:

                        LOAD

                              ORDER_NO & '-' & LINE_NO As Line_ID,

                             FirstValue(CUSTOMER) AS CUSTOMER,

                             FirstValue(ORDER_NO) AS ORDER_NO, ..........

                        FROM [..\Sources\OpenOrdersxls]

                        GROUP BY Line_ID ;

                         

                        and this worked for me.  Certainly worth giving it a go.

                         

                        I have a data table which has multiple unique lines (QA inspection report) but will have multiple duplicate entries when I strip out the data I do not want (which I could not clear using distinct or exists script functions).

                         

                        I loaded the fields I required form the original table into a TEMPQA table, then used the principle above to create one line for each unique entry - this took me original 336,287 line down to 32,603 (correct number of records without duplicates).

                         

                        I then dropped the TEMPQA table.

                         

                        Full script extract below

                         

                        TEMPQA:

                        SQL SELECT CONS as QACons,

                            "CONS/LINE/PALLET" as PalletUsed,

                            "DEL_DET_PAL_PALLET_QTY" as Inspected_Quantity,

                            concat (CONS,'/',LINE) as Consignment,

                            LINE as QALINE,

                            PALLET as QAPALLET,

                            "QA_STATUS_DESC" as RAG

                        FROM "***_***".dbo."EXCEL_QA_INSPECTION_REPORT";

                         

                         

                        QAInspection:

                        LOAD  Distinct

                          PalletUsed,

                          FirstValue (Consignment) as Consignment,

                          FirstValue (Inspected_Quantity) as Inspected_Quantity,

                          FirstValue (RAG) as RAG

                        Resident TEMPQA

                        Group By PalletUsed;

                         

                         

                        DROP Table TEMPQA;