Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi There,
Been battling with this issue for a while and hoping someone can help. I have the below script I'm using to pull data from a spreadsheet. Within this data, there's a "Cache_Reference_Number. What I'm trying to do is ignore rows where the Cache Reference number is duplicated.
MYDATA:
LOAD Cache_Id,
Cache_Crt_Date,
Cache_Broker_Code,
Cache_Reference_Number,
Cache_Incoming
From
C:\Users\Nathan\Documents\ExternalData.xlsx
(ooxml, embedded labels, table is Sheet1);
For example, the below are some Cache Reference Numbers from the file. I'd want to ignore all the "18312102" rows except for the last one.
18312100 |
18312102 |
18312151 |
18312102 |
18312102 |
18312103 |
Any help on this would be appreciated
If you say 'except for the last one' you are referring to the input order of MYDATA or maybe the Cache_Crt_Date, right?
Either way, you should be able to resident load your data with your records ordered descending, and use a where not exists() clause to only load the 'last' reference number. Something along these lines:
INPUT:
LOAD Cache_Reference_Number as Cache_Reference_Number_Full,
RecNo() as ID
INLINE [
Cache_Reference_Number
18312100
18312102
18312151
18312102
18312102
18312103
];
RESULT:
LOAD Cache_Reference_Number_Full as Cache_Reference_Number,
ID
Resident INPUT
where not exists( Cache_Reference_Number, Cache_Reference_Number_Full)
order by ID desc ;
drop table INPUT;
If you say 'except for the last one' you are referring to the input order of MYDATA or maybe the Cache_Crt_Date, right?
Either way, you should be able to resident load your data with your records ordered descending, and use a where not exists() clause to only load the 'last' reference number. Something along these lines:
INPUT:
LOAD Cache_Reference_Number as Cache_Reference_Number_Full,
RecNo() as ID
INLINE [
Cache_Reference_Number
18312100
18312102
18312151
18312102
18312102
18312103
];
RESULT:
LOAD Cache_Reference_Number_Full as Cache_Reference_Number,
ID
Resident INPUT
where not exists( Cache_Reference_Number, Cache_Reference_Number_Full)
order by ID desc ;
drop table INPUT;
Hey there, thanks so much for the response. I ended up doing the below, ordering the Cache_Reference_Number and then using Peek, but I think I prefer your implementation. Thanks for the help.
NoConcatenate
MYNEWDATA:
Load *
Resident MYDATA Order by Cache_Reference_Number;
Drop table MYDATA;
NoConcatenate
DuplicatesRemoved:
Load *
Resident MYNEWDATA
Where Peek(Cache_Reference_Number)<>Cache_Reference_Number;
drop table MYNEWDATA;