Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
pranaview
Creator III
Creator III

How to remove duplicate records while loading a table based on a column

Hi  Guys, 

Problem : I need to load a table from an excel but need to remove duplicate records based on a column for same Order_Numbers. For ex:

Below image shows the sample records in the Orders sheet that has unique Order_Number

Orders.PNG

Below image is of the MappingSheet in which I want to remove the duplicate records for each Order_Number based on the column "IssueType". So, MappingSheet can have multiple records for one Order_Number but what iIwant is to load only those Order_Numbers which have different IssueType.

OrderDuplication.PNG

 So, only highlighted records should be loaded into Qlik. Note that IssueType can be same for different orders as is the case for Order_Number 1 and 2. I have to check for the duplicate IssueType for each Order_Number and if there are more than one records with same IssueType for a single Order_Number then I have to pick only one with the oldest date.

Hope, I have made the requirement clear enough but if not please let me know. I need to come up with a solution real quick for this one so any suggestion would be of great help.

Thanks,

Pranav

 

 

 

Labels (4)
13 Replies
FelipeFuku
Partner - Contributor III
Partner - Contributor III

Hmm, try the last suggestion that i've made...

Without a composite the first Key, only :
max(Date_Auto) as Date_Auto,
count( DISTINCT Order_Number &' | '& IssueType) as %Key

And group by all other fields.
I think it should work... You don't even need to use FirstSortedValue() in this case..

Regards, Fuku.
FelipeFuku
Partner - Contributor III
Partner - Contributor III

Hello again, Pranav.

Did you find a solution? Or did our suggestions worked?

Let us know, please!

Best Regards, Fuku.
pranaview
Creator III
Creator III
Author

Hi Fuku,

I was not able to make it work with your latest suggestion. Although I made it work with a slight tweak in the solution provided by @madhudinesh. But Yesterday, the client again changed(surprise surprise) their mind and now they want the duplicate values to be displayed as well so that they can catch the issue in the source and directly fix it there.But anyway thanks a lot to you and everyone else for pitching in with valuable suggestions. I will close this thread now.

Pranav
bgerchikov
Partner - Creator III
Partner - Creator III

Try this:

TempData:
Load * Inline
[
Order_Number,Team,IssueType,Comment,Date_Auto
1,Team1,Issue1,Oldest Logged,43400
1,Team2,Issue1,Comment2,43401
1,Team3,Issue2,Comment3,43401
2,Team4,Issue1,Comment4,43402
2,Team5,Issue1,Comment5,43403
2,Team6,Issue4,Comment6,43404
]
;
MaxDate:
NoConcatenate Load
    Order_Number,
    IssueType,
    Min(Date_Auto) as Date_Auto
Resident
    TempData
Group By
    Order_Number,
    IssueType
;
Left Join(MaxDate)
Load * Resident TempData;

Drop Table TempData;