9 Replies Latest reply: Aug 29, 2013 8:40 AM by Friedrich Hofmann RSS

    How to find out names that exist twice?

    Friedrich Hofmann


      Hi,

       

      please help me - this must be relatively simple to do - I would have known how to do it in the software I worked with in my last job, just not in QlikView.

      - I have a masterdata table with all employees.

      - Employees who started as contract workers, but were eventually adopted, have two records: They get a different number, only the name stays the same.

      - I want to load (RESIDENT, for I have already loaded the masterdata table) only those records where one name has two records - all two of them per pair, if possible, together with one other field that I have to check also.

      - I have tried using the PEEK fct. inside a WHERE clause (after sorting) to get only those where the name equals that in the record above, that worked, but it returned 0 records.

      - I know I could load aggregated data using a COUNT or COUNT(DISTINCT), but that would not help me, would it?

       

      Thanks a lot!

      Best regards,

       

      DataNibbler

        • Re: How to find out names that exist twice?
          Abey Chandy

          Hi,

           

          Try

          load name,

          count(name) as name_Count

          resident x

          group by name;

           

          Then you can left join this with x using name

           

          And then you can do a resident load again with a field to flag off records with name_Count >= 2.

           

          Regards,

          Abey

            • Re: How to find out names that exist twice?
              Richard Pressanti

              Let's assume that your number field is named numeric_id.

               

              In the preceding answer, there is a typo error: you should replace count(name) by:

               

              count(numeric_id) as number_count

               

               

               

              You can do it in the dashboard with a pivot table whose first dimension is a calculated one:

               

              aggr(

                   count( DISTINCT numeric_id )

                   ,

                   name

              )

               

              and the following dimensions are: name , numeric_id.

              As expresssions of your pivot table, take

              count(DISTINCT name)

              and

              count( DISTINCT numeric_id)

               

              to obtain the volume of the pairs.

                • Re: How to find out names that exist twice?
                  Friedrich Hofmann

                  Hi Richard,

                   

                  thanks for the answer! I don't want to do this in the GUI, but rather in the script - so I will try Abey's solution.

                  I'm still thinking it through - I have to "pump up" that masterdata table, so that instead of

                  one record per employee with a start_date and an end_date, I'll have

                  one record per employee per day inbetween

                  but I guess that LEFT JOIN would not add a lot of data to that table, so I can still do that as I have it now.

                   

                  I will try that and I'll report back here in any case.

                   

                  Thanks a lot!

                  Best regards,

                   

                  DataNibbler

                    • Re: How to find out names that exist twice?
                      Friedrich Hofmann

                      Hi,

                       

                      many thanks, Abey! That worked just as I hoped it would. I am not there yet though ;-)

                      Now I have a table that has only pairs of records - it's always "one name, two records"

                      along with two different ID_numbers, two type_flags and two dates. Now I have to do the following:

                      - I have to make sure those two type_flags in every pair are different - that should always be the case, but I have    already seen an exception.

                      - Of the two different dates (entry_dates) in every pair, I need only the later one (the first one is the date an employee joined the company, the second one is the date he was adopted). I need to link this to my calendar to be able to display the nr. of employees adopted in a certain month, so I can have only one date per name. I have thought of the max() fct, but that would return the latest date of all records, wouldn't it? I'll check.

                       

                      It might get even more complex going forward, but those should be the next steps.

                      As always, I will get to work straight away and see what I can do myself.

                       

                      Thanks a lot!

                      Best regards,

                       

                      DataNibbler

                        • Re: Re: How to find out names that exist twice?

                          You could either use the max function to pull through only the maximum dates grouped by employee name or you could use a calendar link table to link both dates to your master calendar. You can create a calendar link table as follows

                           

                          CalendarLink:

                          Load

                              Field 1, (Primary Key from master table)

                              Field 2, (date type 1 from the master table) as Field 3 (date field in your master calendar)

                              0 as DateType (this flag will allow you to refer to the above date type in a set analysis expression

                          Resident master table;

                           

                          Concatenate (CalendarLink)

                          Load

                              Field 1 (same as above)

                              Field 4 (date type 2 from master table) as Field 3 (link to master calendar)

                              1 as DateType (same as above)

                          Resident master table

                           

                          An example expression might be sum({$<DateType = {0}>}[# Number of Employees])

                           

                          If you've done it right, it should look something like the attached.

                           

                          Andy

                          • Re: How to find out names that exist twice?
                            Abey Chandy

                            Hi,

                             

                            I am assuming you have the following columns in that table:

                            Name, ID_Number, Type, Etry_Date

                             

                            So as you pointed max(Entry_Date) with a "group by name" query should work fine.

                             

                            Temp_Adopt_Load:

                            noconcatenate load name,

                            max(Entry_Date) as Adopt_Date

                            resident x

                            group by name;

                             

                            Adopt_Load:

                            noconcatenate load

                            name,

                            Adopt_Date

                            name&Adopt_Date as Name_Date_Key

                            resident Temp_Adopt_Load;

                             

                            drop table Temp_Adopt_Load;

                             

                            left join load name&Entry_Date as Name_Date_Key,

                            ID_Number,

                            Type

                            resident x;

                             

                            Regards,

                            Abey

                              • Re: How to find out names that exist twice?
                                Friedrich Hofmann

                                Hi,

                                 

                                Abey, I have now used your method (the one you first proposed, with a LEFT JOIN and a consequent RESIDENT LOAD) twice one after the other, with one or two more RESIDENT LOADs inbetween ;-)

                                ... and finally I have nearly what I want: A list with 22 records of only those persons who have two entries with two different type_flags, along with the date they changed type_flags (the later one).

                                <=> Now I again have two records per person even though the other data is all correct.I would like only one.

                                Well, I guess I will make that, too. Maybe I can use a FIRST parameter or so?

                                 

                                Thanks a lot!

                                Best regards,

                                 

                                DataNibbler

                                 

                                Easy - onye you know ;-)

                                FIRST x and FIRSTVALUE() would not do - because I need that stupid GROUP BY - but now I simply add a row_ID and do one more RESIDENT LOAD with only the even or the uneven row_IDs ;-)