Solved: Duplicate records within multiple fields - Qlik Community

Rayk · ‎2024-04-16

Hi,

I want to be able to find duplicated records within a table with around 100 fields.

I have a table with over 100 fields and i have a script that finds duplicated ID's but I want to know which field is responsible for the duplication.

ID	Mobile Number	Cost
10045022	07842479330	32,100
10045022	07982879541	32,100

Here is an example the ID is the same so as the cost but the number is different.

Any help ill be great.

rwunderlich · ‎2024-04-16

I may be overthinking it, but here is a generalized script to build a table of IDs and Fieldnames that have duplicate values within the ID.

Data:
LOAD * Inline [
ID, Fa, Fb, Fc, Fd
1, 34, hello, there, again
1, 34, is, a, duplicate
2, 34, not, a, duplicate
3, 99, 2, 3, 4
3, 00, 2, 3, 88
4, 11, 22, 33, 44
]
;
DupeIds:
Mapping
Load ID, Count(ID) as Count
Resident Data
Group By ID;

For idx = 1 to NoOfFields('Data')
Let vFname = FieldName($(idx), 'Data');
If '$(vFname)' <> 'ID' Then

  TempDistinct:
  // Make a Temp table instead of preceding because distinct
  // is applied at end of LOAD
  LOAD Distinct
    ID, [$(vFname)]
  Resident Data
  Where ApplyMap('DupeIds', ID) > 1
  ;
  DupeFields:
  Load
    ID, '$(vFname)' as DupeField
  Where Count < ApplyMap('DupeIds', ID)
  ;
  LOAD 
    ID, 
    '$(vFname)' as DupeField,
    Count([$(vFname)]) as Count
  Resident TempDistinct
  Group by ID
  ;
  Drop Table TempDistinct;
End if
Next idx

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

View solution in original post

rwunderlich · ‎2024-04-16

I may be overthinking it, but here is a generalized script to build a table of IDs and Fieldnames that have duplicate values within the ID.

Data:
LOAD * Inline [
ID, Fa, Fb, Fc, Fd
1, 34, hello, there, again
1, 34, is, a, duplicate
2, 34, not, a, duplicate
3, 99, 2, 3, 4
3, 00, 2, 3, 88
4, 11, 22, 33, 44
]
;
DupeIds:
Mapping
Load ID, Count(ID) as Count
Resident Data
Group By ID;

For idx = 1 to NoOfFields('Data')
Let vFname = FieldName($(idx), 'Data');
If '$(vFname)' <> 'ID' Then

  TempDistinct:
  // Make a Temp table instead of preceding because distinct
  // is applied at end of LOAD
  LOAD Distinct
    ID, [$(vFname)]
  Resident Data
  Where ApplyMap('DupeIds', ID) > 1
  ;
  DupeFields:
  Load
    ID, '$(vFname)' as DupeField
  Where Count < ApplyMap('DupeIds', ID)
  ;
  LOAD 
    ID, 
    '$(vFname)' as DupeField,
    Count([$(vFname)]) as Count
  Resident TempDistinct
  Group by ID
  ;
  Drop Table TempDistinct;
End if
Next idx

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

Rayk · ‎2024-04-17

Hi Rwunderlich,

It worked with a bit of of tweak, but I already have a table with a duplicated IDs and this table tells me which table it is in, but I need to do your script for each table in the list.

here is an example of the table:

Error	,PrimaryKeyID	,Table
PK Duplication	,10045022	,Order
PK Duplication	,10045037	,Order
PK Duplication	,10046679	,Order
PK Duplication	,ML_10036414/10109581	,Order
PK Duplication	,ML_10037063/10111962	,Order
PK Duplication	,ML_10038116/10114074	,MasterListCSVprovisional
PK Duplication	,ML_10053793/10139362	,Driver
PK Duplication	,ML_10053793/10139437	,Driver
PK Duplication	,ML_10053793/10142502	,Driver
PK Duplication	,ML_10111446/10111962	,Driver
PK Duplication	,ML_10111446/10114074	,Driver
PK Duplication	,ML_10139640/10109581	,Driver
PK Duplication	,ML_10139640/10113870	,Driver
PK Duplication	,ML_10139640/10113875	,Driver

So, for each table in the field table I want it to do your script.

Hope this make sense.

rwunderlich · ‎2024-04-19

So you'll just load from the Resident variable table name instead of Resident Data.

You can loop over the TableNames from in your dupes table like this:

For i = 1 to FieldValueCount('TableName')
Let vTableName = FieldValue('TableName', $(i);

// The field processing loop here using 
Resident [$(vTableName)];

Next i

-Rob

Duplicate records within multiple fields

Application Development