Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I want to be able to find duplicated records within a table with around 100 fields.
I have a table with over 100 fields and i have a script that finds duplicated ID's but I want to know which field is responsible for the duplication.
ID | Mobile Number | Cost |
10045022 | 07842479330 | 32,100 |
10045022 | 07982879541 | 32,100 |
Here is an example the ID is the same so as the cost but the number is different.
Any help ill be great.
I may be overthinking it, but here is a generalized script to build a table of IDs and Fieldnames that have duplicate values within the ID.
Data:
LOAD * Inline [
ID, Fa, Fb, Fc, Fd
1, 34, hello, there, again
1, 34, is, a, duplicate
2, 34, not, a, duplicate
3, 99, 2, 3, 4
3, 00, 2, 3, 88
4, 11, 22, 33, 44
]
;
DupeIds:
Mapping
Load ID, Count(ID) as Count
Resident Data
Group By ID;
For idx = 1 to NoOfFields('Data')
Let vFname = FieldName($(idx), 'Data');
If '$(vFname)' <> 'ID' Then
TempDistinct:
// Make a Temp table instead of preceding because distinct
// is applied at end of LOAD
LOAD Distinct
ID, [$(vFname)]
Resident Data
Where ApplyMap('DupeIds', ID) > 1
;
DupeFields:
Load
ID, '$(vFname)' as DupeField
Where Count < ApplyMap('DupeIds', ID)
;
LOAD
ID,
'$(vFname)' as DupeField,
Count([$(vFname)]) as Count
Resident TempDistinct
Group by ID
;
Drop Table TempDistinct;
End if
Next idx
-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com
I may be overthinking it, but here is a generalized script to build a table of IDs and Fieldnames that have duplicate values within the ID.
Data:
LOAD * Inline [
ID, Fa, Fb, Fc, Fd
1, 34, hello, there, again
1, 34, is, a, duplicate
2, 34, not, a, duplicate
3, 99, 2, 3, 4
3, 00, 2, 3, 88
4, 11, 22, 33, 44
]
;
DupeIds:
Mapping
Load ID, Count(ID) as Count
Resident Data
Group By ID;
For idx = 1 to NoOfFields('Data')
Let vFname = FieldName($(idx), 'Data');
If '$(vFname)' <> 'ID' Then
TempDistinct:
// Make a Temp table instead of preceding because distinct
// is applied at end of LOAD
LOAD Distinct
ID, [$(vFname)]
Resident Data
Where ApplyMap('DupeIds', ID) > 1
;
DupeFields:
Load
ID, '$(vFname)' as DupeField
Where Count < ApplyMap('DupeIds', ID)
;
LOAD
ID,
'$(vFname)' as DupeField,
Count([$(vFname)]) as Count
Resident TempDistinct
Group by ID
;
Drop Table TempDistinct;
End if
Next idx
-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com
Hi Rwunderlich,
It worked with a bit of of tweak, but I already have a table with a duplicated IDs and this table tells me which table it is in, but I need to do your script for each table in the list.
here is an example of the table:
Error | ,PrimaryKeyID | ,Table |
PK Duplication | ,10045022 | ,Order |
PK Duplication | ,10045037 | ,Order |
PK Duplication | ,10046679 | ,Order |
PK Duplication | ,ML_10036414/10109581 | ,Order |
PK Duplication | ,ML_10037063/10111962 | ,Order |
PK Duplication | ,ML_10038116/10114074 | ,MasterListCSVprovisional |
PK Duplication | ,ML_10053793/10139362 | ,Driver |
PK Duplication | ,ML_10053793/10139437 | ,Driver |
PK Duplication | ,ML_10053793/10142502 | ,Driver |
PK Duplication | ,ML_10111446/10111962 | ,Driver |
PK Duplication | ,ML_10111446/10114074 | ,Driver |
PK Duplication | ,ML_10139640/10109581 | ,Driver |
PK Duplication | ,ML_10139640/10113870 | ,Driver |
PK Duplication | ,ML_10139640/10113875 | ,Driver |
So, for each table in the field table I want it to do your script.
Hope this make sense.
So you'll just load from the Resident variable table name instead of Resident Data.
You can loop over the TableNames from in your dupes table like this:
For i = 1 to FieldValueCount('TableName')
Let vTableName = FieldValue('TableName', $(i);
// The field processing loop here using
Resident [$(vTableName)];
Next i
-Rob