Solved: De-duplicating Simple Data Table in Load Script - Qlik Community

Report Inappropriate Content · ‎2016-09-21

Hello QV,

I have a very simple (but large) Staff table that gets updated more frequently than I would like it to be. Because of this, I am trying to put in into QV to help automate some reports.

The issue: for most of my reports, I only care about my Employee ID Number (EIN) and the Manager that is associated with them for this duplicate identificaiton. My list though includes the various Name variations we have on file for any given employee. I want to recreate this list, but remove any duplicates. Here is some basic sample.

EIN	Name	Login	Manager
1234	A Smith	AS34	Manager1
1234	Al Smith	AS34	Manager1
5678	J Coop	JC78	Manager2
9123	L Park	LP23	Manager3

I thought I could create a "mapping" script to de-dup, but apparently I am still not good enough at this as none of the versions I have tried actually remove the duplicates.

Here is what I thought would work but does not as I still have duplicates:

Staff_Mapping:
Mapping
LOAD Distinct
EIN,
Manager
FROM

(ooxml, embedded labels, table is UpdatedUserList);

Staff_List:
LOAD
Login,
UserName,
Title,
Manager,,
EIN,
EIN as [Nurse Decode #],
ApplyMap('Staff_Mapping', EIN) as UniqueFlag
FROM

(ooxml, embedded labels, table is UpdatedUserList);

Thank you in advance for your help in getting these duplicates removed!

sunny_talwar · ‎2016-09-21

My bad, use RecNo() instead of RowNo()

Staff_List:

LOAD *

Where Flag = 1;
LOAD Login,
    UserName,
    Title,
    Manager,,
    EIN,

AutoNumber(RecNo(), EIN) as Flag,
EIN as [Nurse Decode #]
FROM

(ooxml, embedded labels, table is UpdatedUserList);

Sample attached

View solution in original post

sunny_talwar · ‎2016-09-21

Out of the 2 rows of data that is duplicating one will be selected?

Report Inappropriate Content · ‎2016-09-21

Yes, I only need one of the rows (don't really care which one), I just need a list with one unique EIN and the respective columns that go with it (again, don't care which one). While this sample is simple, I have a list of hundreds of employees and there are a quite a few employees that have 5 or even 6 iterations of their names.

Digvijay_Singh · ‎2016-09-21

Try creating one composite field EIN&Manager as EINManager and use

Where not exists(EINManager) while loading.

You may need to do it in preceding/resident load in case it doesn't work while loading from database.

sunny_talwar · ‎2016-09-21

May be try this:

Staff_List:

LOAD *

Where Flag = 1;
LOAD Login,
    UserName,
    Title,
    Manager,,
    EIN,

AutoNumber(RowNo(), EIN) as Flag,
EIN as [Nurse Decode #]
FROM

(ooxml, embedded labels, table is UpdatedUserList);

Report Inappropriate Content · ‎2016-09-21

Thank you, but this does not work, I still get duplicates.

sunny_talwar · ‎2016-09-21

My bad, use RecNo() instead of RowNo()

Staff_List:

LOAD *

Where Flag = 1;
LOAD Login,
    UserName,
    Title,
    Manager,,
    EIN,

AutoNumber(RecNo(), EIN) as Flag,
EIN as [Nurse Decode #]
FROM

(ooxml, embedded labels, table is UpdatedUserList);

Sample attached

Report Inappropriate Content · ‎2016-09-21

Thank you, this did it. Now I will go research what you just did and learn something new.

sunny_talwar · ‎2016-09-21

AutoNumber(RecNo, EIN) as Flag assigned 1, 2, 3 to each EIN. So for example, your data looks like this

EIN, Flag

123, 1

123, 2

123, 3

345, 1

345, 2

156, 1

156, 2

156, 3

156, 4

I then picked the first instance of each EIN in the preceding load using the Where Flag = 1 statement.

HTH

Best,

Sunny