Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi there,
I have a set of data like this
Product No | Date Created | Record No | Date Altered | Reason |
---|---|---|---|---|
1 | 4/8/2014 | RN0001 | 4/6/2014 | Damaged |
1 | 4/8/2014 | RN0025 | 4/6/2014 | AutoRepaired |
1 | 4/1/2015 | RN9090 | 4/11/2014 | Damaged |
2 | 5/9/2015 | RN0002 | 3/9/2015 | Creation |
3 | 6/8/2015 | RN0003 | 6/7/2015 | Damaged |
3 | 6/8/2015 | RN0004 | 6/7/2015 | AutoRepaired |
3 | 10/10/2015 | RN0006 | 9/9/2015 | Damaged |
I would like to remove duplicated based on product no, and get the latest value. Which should look like this
(This is to be done in load script)
Product No | Date Created | Record No | Date Altered | Reason |
---|---|---|---|---|
1 | 4/1/2015 | RN9090 | 4/11/2014 | Damaged |
2 | 5/9/2015 | RN0002 | 3/9/2015 | Creation |
3 | 10/10/2015 | RN0006 | 9/9/2015 | Damaged |
Basically for each product no, i would like to obtain the data that has the LATEST DateCreated,RecordNo,DateAltered.
Thank you.
I guess forget about QlikView for a second, if you pull out 100 different Product No from your whole database and have them in an excel file, how would you pick one row for each Product No? I think we can help you code, but deciding which unique row needs to be pulled is something that you or the users will have to decide.
Lance Jake wrote:
basically in SAS, it would be like group by product id,
and if last.datecreated and last.recordno and last.datealtered.
I am not very familiar with SAS coding. I understand the group by statement here, but can you elaborate the second statement (highlighted in red above) with may be an example or two? May be that's what we need to get this to work.
Hi sunnyt,
i means it will do this
Product No | Date Created | Record No | Date Altered | Reason |
---|---|---|---|---|
1 | 4/8/2015 | RN0001 | 4/6/2014 | Damaged |
1 | 4/8/2015 | RN0095 | 4/11/2014 | AutoRepaired |
1 | 4/8/2015 | RN0035 | 4/11/2014 | Damaged |
3 | 6/8/2015 | RN0008 | 6/7/2015 | Damaged |
3 | 6/8/2015 | RN0004 | 6/7/2015 | AutoRepaired |
3 | 6/8/2015 | RN0006 | 9/9/2015 | Damaged |
Lets say the data above, last.datecreated and last.recordno and last.datealtered,
First it will find the last date created for product no, which is all 4/8/2015. Then it will find the last record no, which is RN 0095, then it will find the latest date for date altered.
Which will yield,
Product no : 1
Date Created: 4/8/2015
Record No : RN0095
Date Altered 4/11/2014
For each product no, i will sort by the criterias i want, taking the largest for each.
So i would sort by date created first for product 1, then record down.
then i would sort record no, and pick the largest,
Then i would sort by date altered, and pick the largest.
Then find the record that fulfills these 3 criteria.
With peek and order by you always get only one record by product (the first Qlik reads using the order by).
Also when there are duplicated rows.
hI MAXGRO,
Thank you for all your prompt reply.
Your method works, but i just need 1 more small adjustment, If say for example for data like this
1, 4/8/2015, RN9090, 4/11/2014, Damaged
1, 4/8/2015, RN0025, 4/6/2014, AutoRepaired
1, 4/8/2015, RN0001, 4/6/2015, Damaged
As you can see, the product no 1, datecreated is all the same, However, the largest record no is at row 1 while the largest datealtered is at row 3, how do i write it such that, if this condition occurs, DONT pick any of them, means just skip this product no altogether.
Well thank you maxgro and sunnyt. You guys have been a great help.
You are very welcome and we are glad we were able to help
HI SunnyT and MaxGro, a problem appeared.
RECORDNUMBER | DATECREATED | TRANSNUMBER | ALTEREDDATE |
ZP0000200 | 30/05/2015 | ZA0000093124 | 29/05/2015 |
ZP0000200 | 17/09/2015 | ZA0000044083 | 29/03/2015 |
Maxgro, i used your peek script and order by this:
order by RECORDNUMBER, DATECREATED desc, ALTEREDDATE desc, TRANSNUMBER desc;
The result is the above table.
Hence using the peek function, it will pick the first record which is the wrong one.
As you can see, the importance is by recordnumber->datecreated->altereddate->transnumber.
In this case, record number is the same, however, date created sorted it wrongly. the correct row to pick was the 2nd row although the transnumber and altered date of the first record is larger.
from my 1 million list of data i have about 10 of these error which i tracked down.
How do i fix this?
Thanks
If you have the order by statement as you have specified above:
Order By RECORDNUMBER, DATECREATED desc, ALTEREDDATE desc, TRANSNUMBER desc;
I don't see why the order should be the like the one you have just posted because Order By statement orders first by the first field and then by the second and then third and so on... in case DATECREATED takes precedence over ALTEREDDATE AND TRANSNUMBER.
UPDATE:
Just test loaded those two rows and I am seeing the right output.
SET DateFormat='DD/MM/YYYY';
Table:
LOAD * INLINE [
Product No, RECORDNUMBER, DATECREATED, TRANSNUMBER, ALTEREDDATE
1, ZP0000200, 30/05/2015, ZA0000093124, 29/05/2015
1, ZP0000200, 17/09/2015, ZA0000044083, 29/03/2015
];
NewTable:
NoConcatenate
LOAD *
Resident Table
Where [Product No]<>Peek('Product No')
Order By RECORDNUMBER, DATECREATED desc, ALTEREDDATE desc, TRANSNUMBER desc;
DROP Table Table;