Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Removing duplicates, getting latest value via multiple criteria

Hi there,

I have a set of data like this

            

Product NoDate CreatedRecord NoDate AlteredReason
14/8/2014RN00014/6/2014Damaged
14/8/2014RN00254/6/2014AutoRepaired
14/1/2015RN90904/11/2014Damaged
25/9/2015RN00023/9/2015Creation
36/8/2015RN00036/7/2015Damaged
36/8/2015RN00046/7/2015AutoRepaired
310/10/2015RN00069/9/2015Damaged

I would like to remove duplicated based on product no, and get the latest value. Which should look like this

(This is to be done in load script)

Product NoDate CreatedRecord NoDate AlteredReason
14/1/2015RN90904/11/2014Damaged
25/9/2015RN00023/9/2015Creation
310/10/2015RN00069/9/2015Damaged

Basically for each product no, i would like to obtain the data that has the LATEST DateCreated,RecordNo,DateAltered.

Thank you.

32 Replies
sunny_talwar

I guess forget about QlikView for a second, if you pull out 100 different Product No from your whole database and have them in an excel file, how would you pick one row for each Product No? I think we can help you code, but deciding which unique row needs to be pulled is something that you or the users will have to decide.

sunny_talwar

Lance Jake wrote:

basically in SAS, it would be like group by product id,

and if last.datecreated and last.recordno and last.datealtered.

I am not very familiar with SAS coding. I understand the group by statement here, but can you elaborate the second statement (highlighted in red above) with may be an example or two? May be that's what we need to get this to work.

Not applicable
Author

Hi sunnyt,

i means it will do this

Product NoDate CreatedRecord NoDate AlteredReason
14/8/2015RN00014/6/2014Damaged
14/8/2015RN00954/11/2014AutoRepaired
14/8/2015RN00354/11/2014Damaged
36/8/2015RN00086/7/2015Damaged
36/8/2015RN00046/7/2015AutoRepaired
36/8/2015RN00069/9/2015Damaged

Lets say the data above, last.datecreated and last.recordno and last.datealtered,

First it will find the last date created for product no, which is all 4/8/2015. Then it will find the last record no, which is RN 0095, then it will find the latest date for date altered.

Which will yield,

Product no : 1

Date Created: 4/8/2015

Record No : RN0095

Date Altered 4/11/2014

Not applicable
Author

For each product no, i will sort by the criterias i want, taking the largest for each.

So i would sort by date created first for product 1, then record down.

then i would sort record no, and pick the largest,

Then i would sort by date altered, and pick the largest.

Then find the record that fulfills these 3 criteria.

maxgro
MVP
MVP

With peek and order by you always get only one record by product (the first Qlik reads using the order by).

Also when there are duplicated rows.

Not applicable
Author

hI MAXGRO,

Thank you for all your prompt reply.

Your method works, but i just need 1 more small adjustment, If say for example for data like this

    1, 4/8/2015, RN9090, 4/11/2014, Damaged

    1, 4/8/2015, RN0025, 4/6/2014, AutoRepaired

    1, 4/8/2015, RN0001, 4/6/2015, Damaged

As you can see, the product no 1, datecreated is all the same, However, the largest record no is at row 1 while the largest datealtered is at row 3, how do i write it such that, if this condition occurs, DONT pick any of them, means just skip this product no altogether.

Not applicable
Author

Well thank you maxgro and sunnyt. You guys have been a great help.

sunny_talwar

You are very welcome and we are glad we were able to help

Not applicable
Author

HI SunnyT and MaxGro, a problem appeared.

RECORDNUMBERDATECREATEDTRANSNUMBERALTEREDDATE
ZP000020030/05/2015ZA000009312429/05/2015
ZP000020017/09/2015ZA000004408329/03/2015

Maxgro, i used your peek script and order by this:

order by RECORDNUMBER, DATECREATED desc, ALTEREDDATE desc, TRANSNUMBER desc;

The result is the above table.

Hence using the peek function, it will pick the first record which is the wrong one.

As you can see, the importance is by recordnumber->datecreated->altereddate->transnumber.

In this case, record number is the same, however, date created sorted it wrongly. the correct row to pick was the 2nd row although the transnumber and altered date of the first record is larger.

from my 1 million list of data i have about 10 of these error which i tracked down.

How do i fix this?

Thanks

sunny_talwar

If you have the order by statement as you have specified above:

Order By RECORDNUMBER, DATECREATED desc, ALTEREDDATE desc, TRANSNUMBER desc;

I don't see why the order should be the like the one you have just posted because Order By statement orders first by the first field and then by the second and then third and so on... in case DATECREATED takes precedence over ALTEREDDATE AND TRANSNUMBER.

UPDATE:

Just test loaded those two rows and I am seeing the right output.

SET DateFormat='DD/MM/YYYY';

Table:

LOAD * INLINE [

Product No, RECORDNUMBER, DATECREATED, TRANSNUMBER, ALTEREDDATE

1, ZP0000200, 30/05/2015, ZA0000093124, 29/05/2015

1, ZP0000200, 17/09/2015, ZA0000044083, 29/03/2015

];

NewTable:

NoConcatenate

LOAD *

Resident Table

Where [Product No]<>Peek('Product No')

Order By RECORDNUMBER, DATECREATED desc, ALTEREDDATE desc, TRANSNUMBER desc;

DROP Table Table;

Capture.PNG