Announcements
You can succeed best and quickest by helping others to succeed. Join the conversation.
cancel
Showing results for
Did you mean:
Creator II

## Load not distinct i.e duplicates only

Hello,

As part of a upstream data cleansing exercise, I am checking the records we are looking for duplicates.

Whilst I am aware of Load Distinct, I am not aware of its opposite

Does it exist?

In the meantime I am using the following script.

recno() as Recno,
*
FROM (qvd);

NoConcatenate
where DuplicateFlag=1;

*,
If(Recno=1, 0, if((Recno - previous(Recno))=1, 0,1))  as DuplicateFlag
order by Recno;

If a more elegant way exists then I am open to suggestions.

Many thanks,

Colin

1 Solution

Accepted Solutions
Partner - Champion III

Fields = ;

n = NoOfFields('TABLE1');

FOR i=1 TO n

Fields = Fields & '[' & FieldName(i,'TABLE1') & ']' & If( i=n , '', '& | &' );

NEXT

n=;i=;

.......

.......

AutoNumber( \$(Fields) ) AS T1#

.....

.....

.....

Fields=;

You might get better performance and less memory usage by using AutoNumberHash256 instead of AutoNumber:

AutoNumberHash256( Field1, Field2, Field3 .... )

Change the code above from using a '& | & ' to using a comma:

Fields = Fields & '[' & FieldName(i,'TABLE1') & ']' & If( i=n , '', ',' );

5 Replies
Partner - Creator II

try to create 2 tables then experiment with where exists and where not exists.

refer to this link: Where not Exists and Exits

best wishes.

Partner - Champion III

This is an example on how to do it with WHERE Exists:

```T1:
AutoNumber(X & '|' & Y) AS T1#
FROM [lib://DATA/duplicates.xlsx]
(ooxml, embedded labels, table is TABLE1);

T2:
X,
Y
FROM [lib://DATA/duplicates.xlsx]
(ooxml, embedded labels, table is TABLE2)
WHERE
Exists(T1#,AutoNumber(X & '|' & Y));

DROP TABLE T1;

```

Creator II
Author

Thank you both.

The only drawback is my dataset has a lot of fields in it.

Is there a quick way to automatically include all the field names inside the Autonumber?

he only method I can think of is to:

A For loop to create a new table with two columns - the original names and the names without spaces

Put all the names without spaces into a variable using concat

Put that variable into the autonumber when creating the new table for the where exists purposes

Applymap back the original names. The duplicates are in a final table

Is just seems like a lot of hard work.

Many thanks,

Colin

Partner - Champion III

Fields = ;

n = NoOfFields('TABLE1');

FOR i=1 TO n

Fields = Fields & '[' & FieldName(i,'TABLE1') & ']' & If( i=n , '', '& | &' );

NEXT

n=;i=;

.......

.......

AutoNumber( \$(Fields) ) AS T1#

.....

.....

.....

Fields=;

You might get better performance and less memory usage by using AutoNumberHash256 instead of AutoNumber:

AutoNumberHash256( Field1, Field2, Field3 .... )

Change the code above from using a '& | & ' to using a comma:

Fields = Fields & '[' & FieldName(i,'TABLE1') & ']' & If( i=n , '', ',' );

Creator II
Author

Petter,

Thank you very much for your expertise.

That has solved this conundrum nicely.

Tags
Community Browser