6 Replies Latest reply: Jun 11, 2015 4:48 AM by Brahim Zamouri RSS

    Most recent Data - Duplicates

      Hi,

       

      I have loaded some data (around 800.000 x 4) in my application Qlik but I have some duplicates for each person (example : Address_1 with validity from 01/01/14 - 31/12/14, Address_2 with validity from 01/01/15 - 31/12/99). In other to have unique data, I would like to select only the most recent data for each ID person. How I can do it ?

       

      Best.

       

      Brahim.

        • Re: Most recent Data - Duplicates
          Federico Velasco

          Hi,

           

          Try this:

          Data:

          LOAD

          *

          Inline [

          Person_id, Adress, Validity_Start, Validity_End

          1, Adress_1, 01/01/2014, 31/12/2014

          1, Adress_2, 01/01/2015, 31/12/9999

          ];

           

          Right Join(Data)

          LOAD

            Person_id,

            LastValue(Validity_Start) as "Validity_Start"

          Resident Data

          Group by Person_id

          Order by Validity_Start asc;

            • Re: Most recent Data - Duplicates
              David Sugito

              Whoops Federico,

               

              Great, now I know that QlikView has it!

              Never used LastValue() before!

              Thanks for sharing!

               

              And btw, I've attached your code too.


              David Sugito

              Mobile: + 62 878 0888 9871
              Phone: + 62 21 569 823 85 / 86
              Email: me@davidshuang.com
              Site: davidshuang.com

              • Re: Most recent Data - Duplicates

                Hello Frederico,

                 

                Thanks a lot for your answer...but what if I want to keep all data fields (as in the code) and keep only the max of the field date for each person id ?

                 

                I have tried a group by but this only work when I have this case:

                 

                Work Fine

                Adresses:

                LOAD [Business Partner Person],

                 

                Max(Adresses.VT) as MaxDate_VT_Adresses,

                 

                FROM
                Adresses.qvd
                (
                qvd)

                Group By [Business Partner Person]

                 

                Doesn't work

                Adresses:

                LOAD[Business Partner Person],
                [Adresses.Address Type],
                Adresses.Source,
                Adresses.VF,

                Max(Adresses.VT) as MaxDate_VT_Adresses,
                [Adresses.Conflict Code],
                [Adresses.Location Code],
                [Adresses.Number From],
                [Adresses.Number To],
                [Adresses.PO Box],
                Adresses.Validity_01_01_2015
                FROM
                Adresses.qvd
                (
                qvd)

                GroupBy[Business Partner Person],[Adresses.Address Type],
                Adresses.Source,
                Adresses.VF,
                [Adresses.Conflict Code],
                [Adresses.Location Code],
                [Adresses.Number From],
                [Adresses.Number To],
                [Adresses.PO Box],
                Adresses.Validity_01_01_2015
                ;

                 

                 

                Best Regards

                 

                 

                Brahim.

              • Re: Most recent Data - Duplicates
                David Sugito

                Hi Brahim,

                 

                You can always use aggregation, in this case MAX(StartDate) for the address.

                I will give you an example using Excel in a moment.

                 

                 

                Hope helps,

                David Sugito

                Mobile: + 62 878 0888 9871
                Phone: + 62 21 569 823 85 / 86
                Email: me@davidshuang.com
                Site: davidshuang.com