Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

How to find out names that exist twice?


Hi,

please help me - this must be relatively simple to do - I would have known how to do it in the software I worked with in my last job, just not in QlikView.

- I have a masterdata table with all employees.

- Employees who started as contract workers, but were eventually adopted, have two records: They get a different number, only the name stays the same.

- I want to load (RESIDENT, for I have already loaded the masterdata table) only those records where one name has two records - all two of them per pair, if possible, together with one other field that I have to check also.

- I have tried using the PEEK fct. inside a WHERE clause (after sorting) to get only those where the name equals that in the record above, that worked, but it returned 0 records.

- I know I could load aggregated data using a COUNT or COUNT(DISTINCT), but that would not help me, would it?

Thanks a lot!

Best regards,

DataNibbler

9 Replies
abeyphilip
Creator II
Creator II

Hi,

Try

load name,

count(name) as name_Count

resident x

group by name;

Then you can left join this with x using name

And then you can do a resident load again with a field to flag off records with name_Count >= 2.

Regards,

Abey

rlp
Creator
Creator

Let's assume that your number field is named numeric_id.

In the preceding answer, there is a typo error: you should replace count(name) by:

count(numeric_id) as number_count

You can do it in the dashboard with a pivot table whose first dimension is a calculated one:

aggr(

     count( DISTINCT numeric_id )

     ,

     name

)

and the following dimensions are: name , numeric_id.

As expresssions of your pivot table, take

count(DISTINCT name)

and

count( DISTINCT numeric_id)

to obtain the volume of the pairs.

datanibbler
Champion
Champion
Author

Hi Richard,

thanks for the answer! I don't want to do this in the GUI, but rather in the script - so I will try Abey's solution.

I'm still thinking it through - I have to "pump up" that masterdata table, so that instead of

one record per employee with a start_date and an end_date, I'll have

one record per employee per day inbetween

but I guess that LEFT JOIN would not add a lot of data to that table, so I can still do that as I have it now.

I will try that and I'll report back here in any case.

Thanks a lot!

Best regards,

DataNibbler

datanibbler
Champion
Champion
Author

Hi,

many thanks, Abey! That worked just as I hoped it would. I am not there yet though 😉

Now I have a table that has only pairs of records - it's always "one name, two records"

along with two different ID_numbers, two type_flags and two dates. Now I have to do the following:

- I have to make sure those two type_flags in every pair are different - that should always be the case, but I have    already seen an exception.

- Of the two different dates (entry_dates) in every pair, I need only the later one (the first one is the date an employee joined the company, the second one is the date he was adopted). I need to link this to my calendar to be able to display the nr. of employees adopted in a certain month, so I can have only one date per name. I have thought of the max() fct, but that would return the latest date of all records, wouldn't it? I'll check.

It might get even more complex going forward, but those should be the next steps.

As always, I will get to work straight away and see what I can do myself.

Thanks a lot!

Best regards,

DataNibbler

Not applicable

You could either use the max function to pull through only the maximum dates grouped by employee name or you could use a calendar link table to link both dates to your master calendar. You can create a calendar link table as follows

CalendarLink:

Load

    Field 1, (Primary Key from master table)

    Field 2, (date type 1 from the master table) as Field 3 (date field in your master calendar)

    0 as DateType (this flag will allow you to refer to the above date type in a set analysis expression

Resident master table;

Concatenate (CalendarLink)

Load

    Field 1 (same as above)

    Field 4 (date type 2 from master table) as Field 3 (link to master calendar)

    1 as DateType (same as above)

Resident master table

An example expression might be sum({$<DateType = {0}>}[# Number of Employees])

If you've done it right, it should look something like the attached.

Andy

datanibbler
Champion
Champion
Author


Hi Andrew,

many thanks!

In the meantime, I have myself found out that the max() function works - I get the later date within each pair of records.

Also, i can use the COUNT(DISTINCT ) fct and another RESIDENT LOAD to get only those pairs of records where the type_flag actually changed.

I will now try to link this to my master_calendar before I get to the even more complex matters 😉

Thanks a lot!

Best regards,

DataNibbler

Not applicable

No worries.

It really depends on what you want to do. If you ever have data which has lots of different date types then the calendar link table is really useful. You can add as many date types as you like.

Andy

abeyphilip
Creator II
Creator II

Hi,

I am assuming you have the following columns in that table:

Name, ID_Number, Type, Etry_Date

So as you pointed max(Entry_Date) with a "group by name" query should work fine.

Temp_Adopt_Load:

noconcatenate load name,

max(Entry_Date) as Adopt_Date

resident x

group by name;

Adopt_Load:

noconcatenate load

name,

Adopt_Date

name&Adopt_Date as Name_Date_Key

resident Temp_Adopt_Load;

drop table Temp_Adopt_Load;

left join load name&Entry_Date as Name_Date_Key,

ID_Number,

Type

resident x;

Regards,

Abey

datanibbler
Champion
Champion
Author

Hi,

Abey, I have now used your method (the one you first proposed, with a LEFT JOIN and a consequent RESIDENT LOAD) twice one after the other, with one or two more RESIDENT LOADs inbetween 😉

... and finally I have nearly what I want: A list with 22 records of only those persons who have two entries with two different type_flags, along with the date they changed type_flags (the later one).

<=> Now I again have two records per person even though the other data is all correct.I would like only one.

Well, I guess I will make that, too. Maybe I can use a FIRST parameter or so?

Thanks a lot!

Best regards,

DataNibbler

Easy - onye you know 😉

FIRST x and FIRSTVALUE() would not do - because I need that stupid GROUP BY - but now I simply add a row_ID and do one more RESIDENT LOAD with only the even or the uneven row_IDs 😉