Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

To remove duplicates in two column

Hi Everyone,

I have a requirement which says to remove the duplicates from the column - "Bill to Customer Name and Customer Part Number" and then load the data. Attach is the excel sheet.

Thanks in advance!

Thanks and Regards,

Drishti Goel

10 Replies
sunny_talwar

Can you give an example of which rows are duplicate and why?

Anonymous
Not applicable
Author

So, if you filter with Quote Number 21179637 you can see the duplicate data as this quote has been quoted multiple times but it was won only one time. So, Now I have remove the duplicates from the column Bill to Customer Name and Customer Part number in order to remove duplicate entry.

Thanks and Regards,

Drishti Goel

Anonymous
Not applicable
Author

Any luck ?

Anonymous
Not applicable
Author

I tried writing the below code, but its not working for me

[Sheet1]:

LOAD DISTINCT *

FROM [lib://Documents/ODS.SIS.MKT.00019 - Open Quote Details (1).xlsx]

(ooxml, embedded labels, table is Sheet1) WHERE Exists([Customer Part Number],[Bill To Customer Name]);

Lisa_P
Employee
Employee

Hi Drishti,

From your earlier comment, I filtered on the Quote Number 21179637 and noticed that in the Marketing Status field there is an indication of QLRC which could appears to be the one.  By filtering on this data, you reduce the rows from 55,384 to 7,395.

1507730.PNG

This is the load script:

[Sheet1]:

LOAD DISTINCT *

FROM [lib://Documents/ODS.SIS.MKT.00019 - Open Quote Details (1).xlsx]

(ooxml, embedded labels, table is Sheet1)

WHERE [Marketing Status]='QLRC';

Not sure if this is your answer or not.

The problem with your previous solution is that the where exists is looking for data already loaded in the script.

Anonymous
Not applicable
Author

Hi Lisa,

Thanks for your reply.

I guess that every row has different values and if I'm putting distinct then its gonna fetch all the 6 rows. Ideally I cannot take a filter on Marketing Status because from the business point of view we can won the status even if it is in 'Draft'. So here in this case, ideally I should not use distinct. But my actual goal is to load the data and put the filter on ([Customer Part Number],[Bill To Customer Name]) so that I can remove the duplicated data. How can I achieve this?

Thanks and Regards,

Drishti Goel

Anonymous
Not applicable
Author

Any luck ?

Lisa_P
Employee
Employee

Hi Drishti,

I still do not understand your data well enough to work this out.  Looking at the fields every row has a [Bill to Customer Name], but many rows do not have [Customer Part Number], and I can't see any relationship with what was sold.  There is still repetition.

Regards,

Lisa

simenkg
Specialist
Specialist

Table:

load * from

[lib://Documents/ODS.SIS.MKT.00019 - Open Quote Details (1).xlsx]

(ooxml, embedded labels, table is Sheet1)

where (not exists([Bill to Customer Name] &'|'& [Customer Part Number]));


//(not exists()) Will be true for the first row read, and then false for the second row read for each combination of the two fields.