13 Replies Latest reply: Aug 14, 2013 8:00 AM by Peter Schulz RSS

    Remove Duplicates

    Hardik Gandhi



      I have a table which holds two key fields: MemberID and CustomerID.


      Before moving forward I want to check for any duplicates in table with a Key combination of MemberID + CustomerID. (Both reside in the same table)


      Any thoughts?




        • Re: Remove Duplicates
          Jason Michaelides

          Try aggregating the resident table and joining to itself:






             1 AS Count



          JOIN (Data)





             Sum(Count) AS Freq

          Resident Data

          Group by MemberID,CustomerID;


          That will show you the frequency of each row.


          Hope this helps,



            • Re: Remove Duplicates
              Hardik Gandhi

              I want to flag the duplicates and remove them from the table.

                • Re: Remove Duplicates
                  Jason Michaelides

                  I assume you want to leave one of each duplicate record behind? Will all other fields also be identical? If not, how do you decide which record to keep?

                    • Re: Remove Duplicates
                      Hardik Gandhi

                      Yes all other fields would be identical. So I can keep any one record of the two or more.

                      • Re: Remove Duplicates

                        Hi. I want to do the same thing but all other fields are not identical. For example, if i have 2 customer id's, i only want to take that ID which has a higher "worth". Here "worth" is just a field name. Any help?


                          • Re: Remove Duplicates
                            Peter Schulz

                            take a look at attached qvw...





                            LOAD * INLINE [

                                ID, worth

                                1, 1

                                1, 2

                                2, 1

                                2, 2

                                3, 1

                                4, 1

                                4, 2

                                4, 3

                                4, 4





                            Noconcatenate Load

                            ID, max(worth) as worth



                            RESIDENT TMP GROUP by ID;



                            DROP TABLE TMP;


                            I think Group by clause just works with data from resident table.


                            Hope it helps.




                            - PS -

                              • Re: Remove Duplicates

                                Thank you very much. Really helpful. I only have the personal edition so can't view your file. Please could you copy and paste the script into this discussion. Also, I have other variables not just id and worth, where do I put these variables in?


                                  • Re: Remove Duplicates
                                    Peter Schulz

                                    the script is pasted in my last post...?!


                                    With variables you mean fields(columns)? With more fields you get an error I think.


                                    I can imagine 2 possible ways to get the data for a full table(perhaps someone knows a better solution):


                                    1. Do as described an then do a join with the origin table to get the rest of the fields.But I'm not that confident with joins so I would do it like this:


                                    2. In short: Create a flag field in the origin table which fields should be loaded(load = 1 or 0)


                                    Create a merged field of ID and worth, like "ID|worth" while loading TEST(and origin table) and create a Mapping table with that data like:


                                    MAPPING LOAD

                                    ID|worth,1 as relevant

                                    RESIDENT TEST;


                                    Than load data(fields) from origin table(TMP) with:



                                    ApplyMap('RELEVANT_ID_MAP',ID|worth,'0') AS load,


                                    And finally load the data in a table with ... RESIDENT sometemptable where load=1;


                                    Look at it more as rough guide not a step by step tutorial...Sorry, I have not the time at the moment. Perhaps later I'll create a sample.


                                    I admit it looks a bit complicated but if you are used to it is really quick and for me it the safest way. Better than joining and not 100% sure what happens to my data, ending up in a mess.


                                    And finally I think there is sure somebody (more advanced) who can do this "better"/easier. But at the moment this works for me pretty well.




                                    - PS -