19 Replies Latest reply: Aug 15, 2017 2:32 AM by UPALI WIJERATNE RSS

    identifying Duplicates

    Simon Withey

      As a Qlik virgin I need the help of you experts. I need to identify and set a flag against a duplicate record in a load script. Example ogf the table below. I need to check if the Id's are the same and if so i need to set a flag against the oldest date only and set the later date as OK. If the dates are equal i just need to set the first record as a duplicate. many thanks

       

        

      IdAuth Date
      C001833719-Jul-17
      C001833727-Jul-17
      CX21179502-Jun-17
      CX21179502-Jun-17
      CX21219808-Jun-17
      CX21219809-Jun-17
      N000248808-Jun-17
      N000248814-Jun-17
      P000074012-Jun-17
      P000074016-Jun-17
      R000024906-Jul-17
      R000024925-Jul-17
        • Re: identifying Duplicates
          Sunny Talwar

          May be this

           

          SET DateFormat='DD-MMM-YY';

           

          Table:

          LOAD * INLINE [

              Id, Auth Date

              C0018337, 19-Jul-17

              C0018337, 27-Jul-17

              CX211795, 02-Jun-17

              CX211795, 02-Jun-17

              CX212198, 08-Jun-17

              CX212198, 09-Jun-17

              N0002488, 08-Jun-17

              N0002488, 14-Jun-17

              P0000740, 12-Jun-17

              P0000740, 16-Jun-17

              R0000249, 06-Jul-17

              R0000249, 25-Jul-17

          ];

           

          FinalTable:

          LOAD Id,

          [Auth Date],

          If(Id = Previous(Id), 'Duplicate') as Flag

          Resident Table

          Order By Id, [Auth Date] desc;

           

          DROP Table Table;

           

          Capture.PNG

            • Re: identifying Duplicates
              Dan Sullivan

              This is good advice from Sunny.  you can also do a load distinct to just eliminate the duplicates where they have the same dates and Id.

              • Re: identifying Duplicates
                Simon Withey

                Thank you Sunny. Apologies for not making myself clear. I won't be typing the data in, it will be part of a non sorted data table/file.

                  • Re: identifying Duplicates
                    Sunny Talwar

                    We know that.... the above was just to show how it can be done.... in your case, you will do something like this

                     

                    SET DateFormat='DD-MMM-YY';

                     

                    Table:

                    LOAD *

                    FROM .....;

                     

                    FinalTable:

                    LOAD Id,

                    [Auth Date],

                    If(Id = Previous(Id), 'Duplicate') as Flag,

                    ....

                    Resident Table

                    Order By Id, [Auth Date] desc;

                     

                    DROP Table Table;

                     

                    Red part is where you will load the data from whatever you are loading it from

                      • Re: identifying Duplicates
                        Simon Withey

                        Thanks Sunny, I'm still having issues. I've tried copying your syntax but I'm obviously doing something wrong.

                        The table is already loaded (called REP). Can I ask you to spell out the red text in idiot terms for me please. As soon as I type FROM [REP] it doesn't like it. Sorry to be a pain

                          • Re: identifying Duplicates
                            Miguel Angel Baeyens de Arce

                            Where are you loading the data from: database, file?

                             

                            Simple example for database:

                             

                            CONNECT TO [Database]; // Use the button "Connect..." in the script editor for the wizard

                             

                            Table:

                            LOAD *;

                            // Put your sql statement here to retrieve the rows

                            SQL SELECT *

                            FROM Table;

                             

                            Simple example for file:

                             

                            Table:

                            LOAD *

                            FROM File.qvd (Qvd); // type of file and options can be different, use "Table Files" button in the script editor

                              • Re: identifying Duplicates
                                Simon Withey

                                The file is already loaded (see example below). It's saying that 'Table' doesn't exist. If you could fill in where and what I should be entering to make it work I'd be very grateful. Many thanks in advance.

                                 

                                 

                                [REP]:

                                 

                                LOAD

                                    "Assessment No",

                                    "Creator Name",

                                    "Vehicle Manufacturer Name",

                                    "Vehicle Model Name",

                                    "Creator Site Id",

                                    "Creator Terminal Id",

                                    "Vehicle Reg",

                                    "First Auth Date",

                                FROM [lib://VIA Motor Drive (Perf & Planning)/Motor Repair\MOTOR REPAIR 2017.xlsx]

                                (ooxml, embedded labels, table is REP);

                                 

                                Table:

                                Load

                                "Assessment No",

                                "First Auth Date";

                                 

                                FinalTable:

                                Load

                                "Assessment No",

                                "First Auth Date",

                                If("Assessment No"=Previous("Assessment No"),'DUP','')as [DUP Flag]

                                Resident Table

                                 

                                Order By "Assessment No",[First Auth Date]desc;

                                 

                                Drop Table Table;

                      • Re: identifying Duplicates
                        Andrew Walker

                        Hi Simon,

                         

                        Another way:

                         

                        Data:

                        LOAD * Inline [

                        Id, Auth Date

                        C0018337, 19-Jul-17

                        C0018337, 27-Jul-17

                        CX211795, 02-Jun-17

                        CX211795, 02-Jun-17

                        CX212198, 08-Jun-17

                        CX212198, 09-Jun-17

                        N0002488, 08-Jun-17

                        N0002488, 14-Jun-17

                        P0000740, 12-Jun-17

                        P0000740, 16-Jun-17

                        R0000249, 06-Jul-17

                        R0000249, 25-Jul-17

                        ];

                         

                         

                        Left Join(Data)

                        LOAD

                        Id,

                        min([Auth Date],2) as [Auth Date],

                        'OK' as Flag

                        Resident Data Group by Id;

                         

                        gives:

                         

                        Id Auth Date Flag
                        C001833727-Jul-17OK
                        C001833719-Jul-17 
                        CX21179502-Jun-17 
                        CX21219809-Jun-17OK
                        CX21219808-Jun-17 
                        N000248814-Jun-17OK
                        N000248808-Jun-17 
                        P000074016-Jun-17OK
                        P000074012-Jun-17 
                        R000024925-Jul-17OK
                        R000024906-Jul-17

                         

                        cheers

                         

                        Andrew

                        • Re: identifying Duplicates
                          Pankaj Thakur

                          Hi Simon,

                          PFA, Solution.

                           

                          Regards,

                          Pankaj