Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Removing duplicates, getting latest value via multiple criteria

Hi there,

I have a set of data like this

            

Product NoDate CreatedRecord NoDate AlteredReason
14/8/2014RN00014/6/2014Damaged
14/8/2014RN00254/6/2014AutoRepaired
14/1/2015RN90904/11/2014Damaged
25/9/2015RN00023/9/2015Creation
36/8/2015RN00036/7/2015Damaged
36/8/2015RN00046/7/2015AutoRepaired
310/10/2015RN00069/9/2015Damaged

I would like to remove duplicated based on product no, and get the latest value. Which should look like this

(This is to be done in load script)

Product NoDate CreatedRecord NoDate AlteredReason
14/1/2015RN90904/11/2014Damaged
25/9/2015RN00023/9/2015Creation
310/10/2015RN00069/9/2015Damaged

Basically for each product no, i would like to obtain the data that has the LATEST DateCreated,RecordNo,DateAltered.

Thank you.

32 Replies
sunny_talwar

Record No right now is a text field. You will need to convert into a number field, May be like this:

KeepChar([Record No], '0123456789') * 100/100

May be this code:

Table:

LOAD [Product No],

  FirstSortedValue([Date Created], -([Date Created]+Check)) as [Date Created],

  FirstSortedValue([Record No], -([Date Created]+Check)) as [Record No],

  FirstSortedValue([Date Altered], -([Date Created]+Check)) as [Date Altered],

  FirstSortedValue(Reason, -([Date Created]+Check)) as Reason

Group By [Product No];

LOAD *,

  KeepChar([Record No], '0123456789') * 100/100 as Check;

LOAD * INLINE [

    Product No, Date Created, Record No, Date Altered, Reason

    1, 4/8/2015, RN0001, 4/6/2014, Damaged //Was given as 4/8/2014

    1, 4/8/2015, RN0095, 4/6/2014, AutoRepaired //Was given as 4/8/2014

    1, 4/8/2015, RN0035, 4/11/2014, Damaged

    3, 6/8/2015, RN0008, 6/7/2015, Damaged

    3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired

    3, 6/8/2015, RN0006, 9/9/2015, Damaged

];

Change the data above (in red) as it seems to me that they will be on the same dates. Let me know if that isn't true

Not applicable
Author

Hi SunnyT,

Thank you for the prompt reply.

As for whether they will be on the same dates, the thing is, this example is just like 6 records out of 1 million records that i have, and there are some records that have the same date created, but diff record no, some have same date created and diff date altered but of course diff record no. So i am trying to cover the possibilities here as i can't go through all 1 million records 1 by 1.

Some records no are like CNB1029-102, some are RN10203, so im looking for one formula that works for all, I thought of converting all to numbers and find the largest one, but i always fail to do so using Num().

Using keepchar might face some issue as mentioned records no format are not the same for each product, But however, for each product they will be in the same format.

For example if product 1 its record No is RNxxxx, it will always be like that.

For product say n, if its CNB1029-202, it will always be in that format, just the number changing.

Just an idea, remember the previous solution you posted? the one using RecNo()?
What if i sort my table first using loadscript, by datecreated, recordno, then date altered. That way, will it ever go wrong?

Thank you so much for your assistance, been trying to solve this problem for quite a while now.

sunny_talwar

If the data can be sorted, I would encourage you to do that and then use RecNo() function.

maxgro
MVP
MVP

Maybe with a where peek; you can change the order by as you want (ascending/descending on 2, 3, n fields)

Or I'm missing something?

Table:

LOAD * INLINE [

    Product No, Date Created, Record No, Date Altered, Reason

  1, 4/8/2014, RN0001, 4/6/2014, Damaged

    1, 4/8/2014, RN0025, 4/6/2014, AutoRepaired

    1, 4/8/2015, RN9090, 4/11/2014, Damaged

    2, 5/9/2015, RN0002, 3/9/2015, Creation

    3, 6/8/2015, RN0003, 6/7/2015, Damaged

    3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired

    3, 10/10/2015, RN0006, 9/9/2015, Damaged

];

NewTable:

NoConcatenate load *

Resident Table

Where [Product No]<>Peek('Product No')

order by [Product No], [Date Created] desc, [Record No] desc, [Date Altered] desc;

DROP Table Table;

Not applicable
Author

Hi SunnyT,

It is a tricky situation considering different variables affecting this.

As i rmb, RecNo() takes data from the sorted loaded table or the original table?
if from the original table then there is no point sorting my data in a loadscript section beforehand?

basically in SAS, it would be like group by product id,

and if last.datecreated and last.recordno and last.datealtered.

How do i do the same here. hmmmmmmmmmmmmmm

Not applicable
Author

Hi maxgro, do you mind explaining this further?

especially the Where [Product No]<>Peek('Product No')


thank you.

maxgro
MVP
MVP

Peek('Product No') is the last record read (I copied the QlikView help, below).

With

Where [Product No]<>Peek('Product No')

you get only the records with a different product no, one record for product no 1, one for 2, one for 3.

The order by is needed to choose the first record by product. Change the order by and you have a different first record. The product no must always be the first field in the order by.

The resident is needed because in Qlik you can only sort in a resident load.

The noconcatenate is needed because the fields are the same and I don't want an automatic concatenate.

You can play removing the where peek and adding the peek(....) as a new field.

peek(fieldname [ , row [ , tablename ] ] )

Returns the contents of the fieldname in the record specified by row in the internal table tablename. Data are fetched from the associative QlikView database.

Fieldname must be given as a string (e.g. a quoted literal).

Row must be an integer. 0 denotes the first record, 1 the second and so on. Negative numbers indicate order from the end of the table. -1 denotes the last record read.

If no row is stated, -1 is assumed.

Tablename is a table label, see Table Labels, without the ending colon. If no tablename is stated, the current table is assumed. If used outside the load statement or referring to another table, the tablename must be included.

Examples:

peek( 'Sales' )
returns the value of Sales in the previous record read ( equivalent to previous(Sales) ).

peek( 'Sales', 2 )
returns the value of Sales from the third record read from the current internal table.

peek( 'Sales', -2 )
returns the value of Sales from the second last record read into the current internal table.

peek( 'Sales', 0, 'Tab1' )
returns the value of Sales from the first record read into the input table labeled Tab1.

Load A, B, numsum( B, peek( 'Bsum' ) ) as Bsum...;
creates an accumulation of B in Bsum.

Not applicable
Author

Hi maxgro, thanks for your explaination.

If thats the case, its true i will only get 1 record per product no, but how do i know which one to choose if say i have 3 product no 1, then diff date created, diff record no and diff date altered similar to the table of data above, Do note i have around 1 million records.

Sorry im relatively new to qlikview, do you mind providing a short example?
There are problems i highlighted to Sunnyindia in the above post as well.

So the idea here is that, you sort desc, and pick the first record for each product no right?

How do i get it to read the first record? Since i don't know how many records there are per productno, there can be 1, 2 ,...n

Thank you

maxgro
MVP
MVP

You choose the record with the order by

With this table

Table:

LOAD * INLINE [

    Product No, Date Created, Record No, Date Altered, Reason

    1, 4/8/2014, RN0001, 4/6/2014, Damaged

    1, 4/8/2014, RN0025, 4/6/2014, AutoRepaired

    1, 4/1/2015, RN9090, 4/11/2014, Damaged

    2, 5/9/2015, RN0002, 3/9/2015, Creation

    3, 6/8/2015, RN0003, 6/7/2015, Damaged

    3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired

    3, 10/10/2015, RN0006, 9/9/2015, Damaged

];

if you want the last (bigger) product by DateCreated,RecordNo,DateAltered.

add an order by product, date created desc, record no desc, date altered desc

NewTable:

NoConcatenate load *

Resident Table

Where [Product No]<>Peek('Product No')

order by [Product No], [Date Created] desc, [Record No] desc, [Date Altered] desc;

DROP Table Table;

Qlik will read the record in this order

The records with product no <> previous product no (the where clause) are the records in bold

    1, 4/1/2015, RN9090, 4/11/2014, Damaged

    1, 4/8/2014, RN0025, 4/6/2014, AutoRepaired

    1, 4/8/2014, RN0001, 4/6/2014, Damaged

  2, 5/9/2015, RN0002, 3/9/2015, Creation

    3, 10/10/2015, RN0006, 9/9/2015, Damaged

    3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired

    3, 6/8/2015, RN0003, 6/7/2015, Damaged


1.png

Do you agree?


If you have different data and / or questions, post them.

maxgro
MVP
MVP

another example, in this one I want the oldest date altered, first rec no, ; you can see I changed the fields order in order by and replace descending with ascending (asc is the default, no need to specify)

NewTable:

NoConcatenate load *

Resident Table

Where [Product No]<>Peek('Product No')

order by [Product No], [Date Altered], [Record No], [Date Created];

1.png