Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
colinodonnel
Creator II
Creator II

Load not distinct i.e duplicates only

Hello,

As part of a upstream data cleansing exercise, I am checking the records we are looking for duplicates.

Whilst I am aware of Load Distinct, I am not aware of its opposite

Does it exist?

In the meantime I am using the following script.

Load1:
LOAD
recno() as Recno,
*
FROM (qvd);

NoConcatenate
Load2:
Load *
where DuplicateFlag=1;

Load  Distinct
*,
If(Recno=1, 0, if((Recno - previous(Recno))=1, 0,1))  as DuplicateFlag
Resident Load1
order by Recno;

Drop table Load1;

If a more elegant way exists then I am open to suggestions.

Many thanks,

Colin

1 Solution

Accepted Solutions
petter
Partner - Champion III
Partner - Champion III

Fields = ;

n = NoOfFields('TABLE1');

FOR i=1 TO n

   Fields = Fields & '[' & FieldName(i,'TABLE1') & ']' & If( i=n , '', '& | &' );

NEXT

n=;i=;

.......

.......

        AutoNumber( $(Fields) ) AS T1#


.....

.....

.....


Fields=;

  

You might get better performance and less memory usage by using AutoNumberHash256 instead of AutoNumber:

       AutoNumberHash256( Field1, Field2, Field3 .... )

Change the code above from using a '& | & ' to using a comma:

   Fields = Fields & '[' & FieldName(i,'TABLE1') & ']' & If( i=n , '', ',' );

View solution in original post

5 Replies
wade12
Partner - Creator II
Partner - Creator II

try to create 2 tables then experiment with where exists and where not exists.

refer to this link: Where not Exists and Exits

best wishes.

petter
Partner - Champion III
Partner - Champion III

This is an example on how to do it with WHERE Exists:

T1:

LOAD

    AutoNumber(X & '|' & Y) AS T1#

FROM [lib://DATA/duplicates.xlsx]

(ooxml, embedded labels, table is TABLE1);

T2:

LOAD

    X,

    Y

FROM [lib://DATA/duplicates.xlsx]

(ooxml, embedded labels, table is TABLE2)

WHERE

  Exists(T1#,AutoNumber(X & '|' & Y));

 

DROP TABLE T1;



colinodonnel
Creator II
Creator II
Author

Thank you both.

The only drawback is my dataset has a lot of fields in it.

Is there a quick way to automatically include all the field names inside the Autonumber?

he only method I can think of is to:

A For loop to create a new table with two columns - the original names and the names without spaces

Put all the names without spaces into a variable using concat

Put that variable into the autonumber when creating the new table for the where exists purposes

Applymap back the original names. The duplicates are in a final table

Is just seems like a lot of hard work.

Many thanks,

Colin

petter
Partner - Champion III
Partner - Champion III

Fields = ;

n = NoOfFields('TABLE1');

FOR i=1 TO n

   Fields = Fields & '[' & FieldName(i,'TABLE1') & ']' & If( i=n , '', '& | &' );

NEXT

n=;i=;

.......

.......

        AutoNumber( $(Fields) ) AS T1#


.....

.....

.....


Fields=;

  

You might get better performance and less memory usage by using AutoNumberHash256 instead of AutoNumber:

       AutoNumberHash256( Field1, Field2, Field3 .... )

Change the code above from using a '& | & ' to using a comma:

   Fields = Fields & '[' & FieldName(i,'TABLE1') & ']' & If( i=n , '', ',' );

colinodonnel
Creator II
Creator II
Author

Petter,

Thank you very much for your expertise.

That has solved this conundrum nicely.