Re: How to remove duplicate records while loading ... - Qlik Community

pranaview · ‎2018-12-10

Hi Guys,

Problem : I need to load a table from an excel but need to remove duplicate records based on a column for same Order_Numbers. For ex:

Below image shows the sample records in the Orders sheet that has unique Order_Number

Below image is of the MappingSheet in which I want to remove the duplicate records for each Order_Number based on the column "IssueType". So, MappingSheet can have multiple records for one Order_Number but what iIwant is to load only those Order_Numbers which have different IssueType.

So, only highlighted records should be loaded into Qlik. Note that IssueType can be same for different orders as is the case for Order_Number 1 and 2. I have to check for the duplicate IssueType for each Order_Number and if there are more than one records with same IssueType for a single Order_Number then I have to pick only one with the oldest date.

Hope, I have made the requirement clear enough but if not please let me know. I need to come up with a solution real quick for this one so any suggestion would be of great help.

Thanks,

Pranav

FelipeFuku · ‎2018-12-10

Hey Pranaview,

Ok, so first I think you need to make a stronger %Key, like:

Order_Number &' | '& IssueType as %Key

After doing that, you can try using count... Like...

If( count(%Key)>1 , max (Date_Auto) , Date_Auto)

I think it should work.. maybe not the best solution, but it's the first thing that came into my mind ...

Note that i didn't test it, so let me know if it works..

Hope that I've helped you.

Regards, Fuku.

pranaview · ‎2018-12-10

Hi Fuku,
Thanks a lot for the suggestion.
I have a question. Where should i add the If condition? During the load? or in the charts? cos what i want is to just not load the duplicate records into Qlik.

madhudinesh · ‎2018-12-10

Try with the below

T:
LOAD [Order Number],
Team,
[Issue Type],
Comment,
Date
FROM
[z:\xxx.xlsx]
(ooxml, embedded labels, table is Sheet2);

T1:
Load *
Where Flag='Yes';

Load *,
if([Order Number]&[Issue Type]=Previous([Order Number]&[Issue Type]),'No','Yes') as Flag
Resident T order by [Order Number] asc,[Issue Type] asc;

DROP Table T;

FelipeFuku · ‎2018-12-10

Well, if you don't want to load the duplicate, you can add it to the script, but you'll need to "Group by" the other fields...

kjhertz · ‎2018-12-10

There are many ways to achieve this. You can construct a composit key and use firstsortedvalue on all the fields etc perhaps using an inner Join by RowNo would be faster. Be aware that you are using a transparent rule regarding which rows you keep and discard, in the example below we take the first value sorted by date.

source:
load
*,
RowNo() as RowNo
;
load * inline [
OrderNo, IssueType, DateId
1,1,1
1,1,2
1,2,2
2,1,3
2,5,4
3,6,5
4,7,6
4,7,8
5,9,8
5,10,9
5,10,10
];

inner join (source)

Load
FirstSortedValue(RowNo,DateId) as RowNo
Resident source group by OrderNo, IssueType;

FelipeFuku · ‎2018-12-10

You can do something like this:

Load *,

Max(Date_Auto) as Date_Auto,

count( DISTINCT Order_Number &' | '& IssueType) as %Key,

From [] Group by ;

You need to put all other fields in Group , except by Key fields and Date;

Hope it helps Pranav.

Fuku.

pranaview · ‎2018-12-10

Hi Fuku,
So, I tried doing what has suggested in your first post. I created a composite key of Order_Number and IssueType i.e. %Key while loading the data.
Then I tried to use firstsortedvalue() function to get unique records based on the composite key as below
Load %Key,
Date(Max(Date_Auto),'MM/DD/YYYY') as Date_Auto,
FirstSortedValue(Distinct Order_Number,-Date_Auto) as Order_Number,
FirstSortedValue(Distinct Team,-Date_Auto) as Team,
FirstSortedValue(Distinct IssueType,-Date_Auto) as IssueType,
FirstSortedValue(Distinct Comment,-Date_Auto) as Comment
Resident Delivery Group by %Key;

But in the Table view, i am just getting value in %Key column, rest are all null.

Pranav

pranaview · ‎2018-12-10

Hi Jonas,
Thanks for the response.
I tried the first approach i.e. creating a composite key of Order Number and IssueType and then using firstsortedvalue() by Date_Auto field and Group By on the composite key but in the result I am not getting any values except in the composite key column.

I tried the script that you have pasted here but it is not showing any data in the Table View.
I am really new to QV. Hence the struggle.

Thanks,
Pranav

kjhertz · ‎2018-12-10

Can you post the script that you have tried here?

How to remove duplicate records while loading a table based on a column

General Question

QlikView App Dev