Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

De-duplicating Simple Data Table in Load Script

Hello QV,

I have a very simple (but large) Staff table that gets updated more frequently than I would like it to be.  Because of this, I am trying to put in into QV to help automate some reports.

The issue: for most of my reports, I only care about my Employee ID Number (EIN) and the Manager that is associated with them for this duplicate identificaiton.  My list though includes the various Name variations we have on file for any given employee.  I want to recreate this list, but remove any duplicates.  Here is some basic sample.

   

EINNameLoginManager
1234A SmithAS34Manager1
1234Al SmithAS34Manager1
5678J CoopJC78Manager2
9123L ParkLP23Manager3

I thought I could create a "mapping" script to de-dup, but apparently I am still not good enough at this as none of the versions I have tried actually remove the duplicates.

Here is what I thought would work but does not as I still have duplicates:

Staff_Mapping:
Mapping
LOAD Distinct
EIN
,
Manager
FROM

(
ooxml, embedded labels, table is UpdatedUserList);

Staff_List:
LOAD
Login,
UserName,
Title,
Manager,,
EIN,

EIN as [Nurse Decode #],
ApplyMap('Staff_Mapping', EIN) as UniqueFlag
FROM

(
ooxml, embedded labels, table is UpdatedUserList);

Thank you in advance for your help in getting these duplicates removed!

1 Solution

Accepted Solutions
sunny_talwar

My bad, use RecNo() instead of RowNo()

Staff_List:

LOAD *

Where Flag = 1;
LOAD Login,
    UserName,
    Title,
    Manager,,
    EIN,

    AutoNumber(RecNo(), EIN) as Flag,
    EIN as [Nurse Decode #]
FROM

(ooxml, embedded labels, table is UpdatedUserList);

Sample attached

View solution in original post

8 Replies
sunny_talwar

Out of the 2 rows of data that is duplicating one will be selected?

Not applicable
Author

Yes, I only need one of the rows (don't really care which one), I just need a list with one unique EIN and the respective columns that go with it (again, don't care which one).  While this sample is simple, I have a list of hundreds of employees and there are a quite a few employees that have 5 or even 6 iterations of their names.

Digvijay_Singh

Try creating one composite field EIN&Manager as EINManager and use

Where not exists(EINManager) while loading.


You may need to do it in preceding/resident load in case it doesn't work while loading from database.

sunny_talwar

May be try this:

Staff_List:

LOAD *

Where Flag = 1;
LOAD Login,
    UserName,
    Title,
    Manager,,
    EIN,

    AutoNumber(RowNo(), EIN) as Flag,
    EIN as [Nurse Decode #]
FROM

(ooxml, embedded labels, table is UpdatedUserList);

Not applicable
Author

Thank you, but this does not work, I still get duplicates.

sunny_talwar

My bad, use RecNo() instead of RowNo()

Staff_List:

LOAD *

Where Flag = 1;
LOAD Login,
    UserName,
    Title,
    Manager,,
    EIN,

    AutoNumber(RecNo(), EIN) as Flag,
    EIN as [Nurse Decode #]
FROM

(ooxml, embedded labels, table is UpdatedUserList);

Sample attached

Not applicable
Author

Thank you, this did it.  Now I will go research what you just did and learn something new.

sunny_talwar

AutoNumber(RecNo, EIN) as Flag assigned 1, 2, 3 to each EIN. So for example, your data looks like this

EIN, Flag

123, 1

123, 2

123, 3

345, 1

345, 2

156, 1

156, 2

156, 3

156, 4

I then picked the first instance of each EIN in the preceding load using the Where Flag = 1 statement.

HTH

Best,

Sunny