Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi there,
I have a set of data like this
Product No | Date Created | Record No | Date Altered | Reason |
---|---|---|---|---|
1 | 4/8/2014 | RN0001 | 4/6/2014 | Damaged |
1 | 4/8/2014 | RN0025 | 4/6/2014 | AutoRepaired |
1 | 4/1/2015 | RN9090 | 4/11/2014 | Damaged |
2 | 5/9/2015 | RN0002 | 3/9/2015 | Creation |
3 | 6/8/2015 | RN0003 | 6/7/2015 | Damaged |
3 | 6/8/2015 | RN0004 | 6/7/2015 | AutoRepaired |
3 | 10/10/2015 | RN0006 | 9/9/2015 | Damaged |
I would like to remove duplicated based on product no, and get the latest value. Which should look like this
(This is to be done in load script)
Product No | Date Created | Record No | Date Altered | Reason |
---|---|---|---|---|
1 | 4/1/2015 | RN9090 | 4/11/2014 | Damaged |
2 | 5/9/2015 | RN0002 | 3/9/2015 | Creation |
3 | 10/10/2015 | RN0006 | 9/9/2015 | Damaged |
Basically for each product no, i would like to obtain the data that has the LATEST DateCreated,RecordNo,DateAltered.
Thank you.
another example, in this one I want the oldest date altered, first rec no, ; you can see I changed the fields order in order by and replace descending with ascending (asc is the default, no need to specify)
NewTable:
NoConcatenate load *
Resident Table
Where [Product No]<>Peek('Product No')
order by [Product No], [Date Altered], [Record No], [Date Created];
Try this:
Table:
LOAD [Product No],
FirstSortedValue([Date Created], -[Date Created]) as [Date Created],
FirstSortedValue([Record No], -[Date Created]) as [Record No],
FirstSortedValue([Date Altered], -[Date Created]) as [Date Altered],
FirstSortedValue(Reason, -[Date Created]) as Reason
Group By [Product No];
LOAD * INLINE [
Product No, Date Created, Record No, Date Altered, Reason
1, 4/8/2014, RN0001, 4/6/2014, Damaged
1, 4/8/2014, RN0025, 4/6/2014, AutoRepaired
1, 4/1/2015, RN9090, 4/11/2014, Damaged
2, 5/9/2015, RN0002, 3/9/2015, Creation
3, 6/8/2015, RN0003, 6/7/2015, Damaged
3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired
3, 10/10/2015, RN0006, 9/9/2015, Damaged
];
Hi i used resident table, and your code, the result is that it removed all records that have duplicate, leaving only product code 2 behind when i tried searching for it.
This data set is only a small portion of my data, my real data has around 100k+ but the concept is exactly the same.
---------------------------
I tried your code, i know why it disappears, i was looking at a set of products, that are of all the same values for every field except for recordno.
Is it possible for it to remove all duplicates, and seeing if say we have this
This is the example set of data that disappeared.
Product No, Date Created, Record No, Date Altered, Reason
1, 4/8/2014, RN0001, 4/6/2014, Damaged
1, 4/8/2014, RN0025, 4/6/2014, AutoRepaired
1, 4/8/2014, RN9090, 4/6/2014, Damaged
Because the code locks it to datecreated, hence i can understand why it would disappear as all the datecreated are the same.
Is there a way for it to look at all 3 fields(datecreated, recordno and date altered)? For example
If all the 3 entries above for Date created are the same, it will look at record no, and pick the last record no.
If even the record no are all the same, then it will look at date altered and find the latest one.
in short is it possible for it to pick, the latest date created & last record no & latest date altered?
Try this code:
Table:
LOAD [Product No],
FirstSortedValue([Date Created], -([Date Created]+RecNo())) as [Date Created],
FirstSortedValue([Record No], -([Date Created]+RecNo())) as [Record No],
FirstSortedValue([Date Altered], -([Date Created]+RecNo())) as [Date Altered],
FirstSortedValue(Reason, -([Date Created]+RecNo())) as Reason
Group By [Product No];
LOAD * INLINE [
Product No, Date Created, Record No, Date Altered, Reason
1, 4/8/2014, RN0001, 4/6/2014, Damaged
1, 4/8/2014, RN0025, 4/6/2014, AutoRepaired
1, 4/8/2015, RN9090, 4/11/2014, Damaged
2, 5/9/2015, RN0002, 3/9/2015, Creation
3, 6/8/2015, RN0003, 6/7/2015, Damaged
3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired
3, 10/10/2015, RN0006, 9/9/2015, Damaged
];
Hi do you mind explaining how does adding recno() help? Thanks so much.
Is there another way to solve this, i have another part coming up doing the same thing, And this time the field will be mixture of text and numbers like the field in RECORD NO. Will it be confused if a larger set of say 1 million rows (not sorted) involved?
Thank you.
You can try it,
ProdMaxDate:
LOAD * INLINE [
Product No, Date Created, Record No, Date Altered, Reason
1, 4/8/2014, RN0001, 4/6/2014, Damaged
1, 4/8/2014, RN0025, 4/6/2014, AutoRepaired
1, 4/1/2015, RN9090, 4/11/2014, Damaged
2, 5/9/2015, RN0002, 3/9/2015, Creation
3, 6/8/2015, RN0003, 6/7/2015, Damaged
3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired
3, 10/10/2015, RN0006, 9/9/2015, Damaged
];
LatestDate:
inner join
Load [Product No],
Date(max([Date Created])) as [Date Created]
Resident ProdMaxDate
group by [Product No];
Hi as from previous top replies, this wont work because if
3, 6/8/2015, RN0003, 6/7/2015, Damaged
3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired
3, 6/8/2015, RN0006, 9/9/2015, Damaged
This happens, then it wont get the correct one cause all the date created are the same.
Do you have one that takes max of datecreated, max of recordNo and max of date altered?
For example for cases like this,
3, 6/8/2015, RN0003, 6/7/2015, Damaged
3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired
3, 6/8/2015, RN0006, 6/7/2015, Damaged
So is there a script that takes into account all possible scenarios?
Did you try this already?
Table:
LOAD [Product No],
FirstSortedValue([Date Created], -([Date Created]+RecNo())) as [Date Created],
FirstSortedValue([Record No], -([Date Created]+RecNo())) as [Record No],
FirstSortedValue([Date Altered], -([Date Created]+RecNo())) as [Date Altered],
FirstSortedValue(Reason, -([Date Created]+RecNo())) as Reason
Group By [Product No];
LOAD * INLINE [
Product No, Date Created, Record No, Date Altered, Reason
1, 4/8/2014, RN0001, 4/6/2014, Damaged
1, 4/8/2014, RN0025, 4/6/2014, AutoRepaired
1, 4/8/2015, RN9090, 4/11/2014, Damaged
2, 5/9/2015, RN0002, 3/9/2015, Creation
3, 6/8/2015, RN0003, 6/7/2015, Damaged
3, 6/8/2015, RN0004, 6/7/2015, AutoRepaired
3, 10/10/2015, RN0006, 9/9/2015, Damaged
];
RecNo() is the order of your rows. So if you see it in table form, you will see this:
Product No | Date Created | Record No | Date Altered | Reason | RecNo |
---|---|---|---|---|---|
1 | 4/8/2014 | RN0001 | 4/6/2014 | Damaged | 1 |
1 | 4/8/2014 | RN0025 | 4/6/2014 | AutoRepaired | 2 |
1 | 4/1/2015 | RN9090 | 4/11/2014 | Damaged | 3 |
2 | 5/9/2015 | RN0002 | 3/9/2015 | Creation | 4 |
3 | 6/8/2015 | RN0003 | 6/7/2015 | Damaged | 5 |
3 | 6/8/2015 | RN0004 | 6/7/2015 | AutoRepaired | 6 |
3 | 10/10/2015 | RN0006 | 9/9/2015 | Damaged | 7 |
So I am sorting by [Date Created] + RecNo(). So when they are all the same the one with the highest RecNo() row will be selected.
Hi SunnyT,
Yes i did try on a small set of my data and it works, Not sure if it is the same on larger scale,
I just would like to understand something. I understand what you explained.
Product No | Date Created | Record No | Date Altered | Reason | RecNo |
---|---|---|---|---|---|
1 | 4/8/2014 | RN0001 | 4/6/2014 | Damaged | 1 |
1 | 4/8/2014 | RN0025 | 4/6/2014 | AutoRepaired | 2 |
1 | 4/1/2015 | RN9090 | 4/11/2014 | Damaged | 3 |
2 | 5/9/2015 | RN0002 | 3/9/2015 | Creation | 4 |
3 | 6/8/2015 | RN0003 | 6/7/2015 | Damaged | 5 |
3 | 6/8/2015 | RN0004 | 6/7/2015 | AutoRepaired | 6 |
3 | 10/10/2015 | RN0006 | 9/9/2015 | Damaged | 7 |
The data above is already pretty much sorted.
What happens if the data is like this?
Product No | Date Created | Record No | Date Altered | Reason |
---|---|---|---|---|
1 | 4/8/2014 | RN0001 | 4/6/2014 | Damaged |
1 | 4/8/2014 | RN0095 | 4/6/2014 | AutoRepaired |
1 | 4/8/2015 | RN0035 | 4/11/2014 | Damaged |
3 | 6/8/2015 | RN0008 | 6/7/2015 | Damaged |
3 | 6/8/2015 | RN0004 | 6/7/2015 | AutoRepaired |
3 | 6/8/2015 | RN0006 | 9/9/2015 | Damaged |
This set of data is unsorted, as you can see the latest record no is the 2nd row for product no 1, and for product no 3, the latest record no is the 4th row.
How would your script handle this as we didnt mention how the data of record no and date altered will be sorted.
Does firstsortedvalue sort unspecified columns(since recordno and date altered are not mentioned)
Thank you so much SunnyT, Appreciate all your replies as it really improved my learning.
Oh another ques, Is it possible to do [Date Created]+[Record No]+[Date Altered] instead of [Date Created] + RecNo() ?
I tried it, it didnt work of course.