Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am dealing with a table with running records where one person should only appear once on a day. There are records where one person appear twice on a day. I need to remove these "duplicates" from the table before I can do any analysis. The other fields of the records might not be the same. I plan to keep only the records that have the latest Transaction ID. Eg. if a person has two records on the same day, one Transaction ID is 9011 the other is 9064, I want to keep 9064.
How can I do it? Can I do it in the LOAD step or in expressions?
Thanks in advance!
Try this:
//First load the table like a normal load script
Stage1:
LOAD * FROM [lib://Folder/File.qvd];
//Then do a order by something like this
Noconcatenate
Stage2:
LOAD * Resident Stage1
order by PersonID, Date, TransID Desc; //Descending because you want TransID with higher value
Drop Table Stage1;
//Here you flag your undesired records
Noconcatenate
Stage3:
LOAD
*,
if(peek(PersonID)=PersonID and peek(Date)=Date,1,0) as Flag
Resident Stage2;
Drop Table Stage2:
//Now remove those undesired records
Noconcatenate
Stage4:
LAOD * Resident Stage3
where Flag<>1;
Drop Table Stage3;
Drop Field Flag;
Bonjour Jade,
Si une de ces réponse est correcte, veuillez mettre cette question comme résolue pour aider les autres membres de la communauté qui auraient le même problème que vous.
Merci