I have a QVD file with duplicate records in it as a result of a an incremental load appending to it each day. Its a supply chain app, with data coming from a third party daily. Each time they make changes in their system, the record is flagged to be included in the extract we get the next day.
The loading of the old data in the QVD and the appending on the new daily file is working ok, and if possible I don't want to change this process, what I would like to do, is create a new process to read the QVD with duplicate records and produce a QVD without duplicates.
Right now the following 5 fields define a unqiue record:
FIlename (which contains the date of that file load)
I have read the advice on this board where a NOT EXIST in PK idea is used, but I don't think this would work for me.
I thought, I could build a unique key out of the concatenation of these fields, then sort the QVD in descending order (making the last date loaded be at the top of the sort), and then taking the first unique record. If you think this would work, could you please help me with the syntax.
Well you could do it that way, but it would probably be easier to do a GROUP BY rather than an ORDER BY. You would then use either max or maxtring to take only unique values. Example:
LOAD [PO Order#], [Season], [Style], [Color], [Size], max(field1) as field1, //for numeric fields, maxstring(field2) as field2 //for text fields RESIDENT data GROUP BY [PO Order#],[Season],[Style],[Color],[Size];