Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
pranaview
Creator III
Creator III

How to remove duplicate records while loading a table based on a column

Hi  Guys, 

Problem : I need to load a table from an excel but need to remove duplicate records based on a column for same Order_Numbers. For ex:

Below image shows the sample records in the Orders sheet that has unique Order_Number

Orders.PNG

Below image is of the MappingSheet in which I want to remove the duplicate records for each Order_Number based on the column "IssueType". So, MappingSheet can have multiple records for one Order_Number but what iIwant is to load only those Order_Numbers which have different IssueType.

OrderDuplication.PNG

 So, only highlighted records should be loaded into Qlik. Note that IssueType can be same for different orders as is the case for Order_Number 1 and 2. I have to check for the duplicate IssueType for each Order_Number and if there are more than one records with same IssueType for a single Order_Number then I have to pick only one with the oldest date.

Hope, I have made the requirement clear enough but if not please let me know. I need to come up with a solution real quick for this one so any suggestion would be of great help.

Thanks,

Pranav

 

 

 

Labels (4)
13 Replies
FelipeFuku
Partner - Contributor III
Partner - Contributor III

Hey Pranaview,

Ok, so first I think you need to make a stronger %Key, like:

Order_Number &' | '& IssueType as %Key

After doing that, you can try using count... Like...

If( count(%Key)>1 , max (Date_Auto) , Date_Auto)

I think it should work.. maybe not the best solution, but it's the first thing that came into my mind ...

Note that i didn't test it, so let me know if it works..

Hope that I've helped you.

Regards, Fuku.
pranaview
Creator III
Creator III
Author

Hi Fuku,
Thanks a lot for the suggestion.
I have a question. Where should i add the If condition? During the load? or in the charts? cos what i want is to just not load the duplicate records into Qlik.
madhudinesh
Partner - Contributor II
Partner - Contributor II

Try with the below

T:
LOAD [Order Number],
Team,
[Issue Type],
Comment,
Date
FROM
[z:\xxx.xlsx]
(ooxml, embedded labels, table is Sheet2);

T1:
Load *
Where Flag='Yes';

Load *,
if([Order Number]&[Issue Type]=Previous([Order Number]&[Issue Type]),'No','Yes') as Flag
Resident T order by [Order Number] asc,[Issue Type] asc;

DROP Table T;

FelipeFuku
Partner - Contributor III
Partner - Contributor III

Well, if you don't want to load the duplicate, you can add it to the script, but you'll need to "Group by" the other fields...
kjhertz
Partner - Creator
Partner - Creator

There are many ways to achieve this. You can construct a composit key and use firstsortedvalue on all the fields etc perhaps using an inner Join by RowNo would be faster. Be aware that you are using a transparent rule regarding which rows you keep and discard, in the example below we take the first value sorted by date. 

source:
load
*,
RowNo() as RowNo
;
load * inline [
OrderNo, IssueType, DateId
1,1,1
1,1,2
1,2,2
2,1,3
2,5,4
3,6,5
4,7,6
4,7,8
5,9,8
5,10,9
5,10,10
];

inner join (source)

Load
FirstSortedValue(RowNo,DateId) as RowNo
Resident source group by OrderNo, IssueType;

FelipeFuku
Partner - Contributor III
Partner - Contributor III

You can do something like this:

Load    *,  

             Max(Date_Auto)               as Date_Auto,

             count( DISTINCT  Order_Number &' | '& IssueType)                      as %Key,

 

From []  Group by ;

You need to put all other fields in Group , except by Key fields and Date;

      

Hope it helps Pranav.

Fuku.

pranaview
Creator III
Creator III
Author

Hi Fuku,
So, I tried doing what has suggested in your first post. I created a composite key of Order_Number and IssueType i.e. %Key while loading the data.
Then I tried to use firstsortedvalue() function to get unique records based on the composite key as below
Load %Key,
Date(Max(Date_Auto),'MM/DD/YYYY') as Date_Auto,
FirstSortedValue(Distinct Order_Number,-Date_Auto) as Order_Number,
FirstSortedValue(Distinct Team,-Date_Auto) as Team,
FirstSortedValue(Distinct IssueType,-Date_Auto) as IssueType,
FirstSortedValue(Distinct Comment,-Date_Auto) as Comment
Resident Delivery Group by %Key;

But in the Table view, i am just getting value in %Key column, rest are all null.

Pranav
pranaview
Creator III
Creator III
Author

Hi Jonas,
Thanks for the response.
I tried the first approach i.e. creating a composite key of Order Number and IssueType and then using firstsortedvalue() by Date_Auto field and Group By on the composite key but in the result I am not getting any values except in the composite key column.

I tried the script that you have pasted here but it is not showing any data in the Table View.
I am really new to QV. Hence the struggle.

Thanks,
Pranav
kjhertz
Partner - Creator
Partner - Creator

Can you post the script that you have tried here?