Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Guys,
Problem : I need to load a table from an excel but need to remove duplicate records based on a column for same Order_Numbers. For ex:
Below image shows the sample records in the Orders sheet that has unique Order_Number
Below image is of the MappingSheet in which I want to remove the duplicate records for each Order_Number based on the column "IssueType". So, MappingSheet can have multiple records for one Order_Number but what iIwant is to load only those Order_Numbers which have different IssueType.
So, only highlighted records should be loaded into Qlik. Note that IssueType can be same for different orders as is the case for Order_Number 1 and 2. I have to check for the duplicate IssueType for each Order_Number and if there are more than one records with same IssueType for a single Order_Number then I have to pick only one with the oldest date.
Hope, I have made the requirement clear enough but if not please let me know. I need to come up with a solution real quick for this one so any suggestion would be of great help.
Thanks,
Pranav
Try with the below
T:
LOAD [Order Number],
Team,
[Issue Type],
Comment,
Date
FROM
[z:\xxx.xlsx]
(ooxml, embedded labels, table is Sheet2);
T1:
Load *
Where Flag='Yes';
Load *,
if([Order Number]&[Issue Type]=Previous([Order Number]&[Issue Type]),'No','Yes') as Flag
Resident T order by [Order Number] asc,[Issue Type] asc;
DROP Table T;
There are many ways to achieve this. You can construct a composit key and use firstsortedvalue on all the fields etc perhaps using an inner Join by RowNo would be faster. Be aware that you are using a transparent rule regarding which rows you keep and discard, in the example below we take the first value sorted by date.
source:
load
*,
RowNo() as RowNo
;
load * inline [
OrderNo, IssueType, DateId
1,1,1
1,1,2
1,2,2
2,1,3
2,5,4
3,6,5
4,7,6
4,7,8
5,9,8
5,10,9
5,10,10
];
inner join (source)
Load
FirstSortedValue(RowNo,DateId) as RowNo
Resident source group by OrderNo, IssueType;
You can do something like this:
Load *,
Max(Date_Auto) as Date_Auto,
count( DISTINCT Order_Number &' | '& IssueType) as %Key,
From [] Group by ;
You need to put all other fields in Group , except by Key fields and Date;
Hope it helps Pranav.
Fuku.
Can you post the script that you have tried here?