Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have some code that seems not to be behaving correctly and I'd love a 2nd opinion please.
This is an incremental load where during the load, associated dates are being set based on the load. It works - kind of - and in most cases (this is one of various entity tables), but I'm finding discrepancies where the resulting load differs from the data being ingested.
The intended result is 1 table, with 1 row per unique entry, with accurate dates in the VerifiedDate, AddedDate and RemovedDate fields.
Thanks in advance!
//Load New Data
VM:
LOAD
vm.id,
1 as vm.counter,
'$(vLoad)' as vm.VerifiedDate, //set VerifiedDate to the current load date (var set in load)
date(null()) as vm.RemovedDate //set RemovedDate to Null - initial load
FROM [lib://datafiles/vm\$(vFile)] ;
VM_Target:
LOAD
vm.id,
vm.id as orig,
vm.counter,
vm.VerifiedDate as vm.AddedDate, //new row, create AddedDate and as current load
vm.VerifiedDate, //Added and Verified Dates Match
vm.RemovedDate //still Null()
From [lib://VCP qvd/vm.qvd](qvd)
Where exists(vm.id, vm.id);
Join (VM_Target)
Load
vm.id,
vm.counter,
vm.AddedDate,
vm.VerifiedDate,
'$(vLoad)' as vm.RemovedDate //if row is not present in new data and does not already contain a RemovedDate, update RemovedDate to current load
From [lib://VCP qvd/vm.qvd](qvd)
Where not exists(orig, vm.id) and
isnull(vm.RemovedDate)
Join (VM_Target)
Load * //bring data in from qvd as-is if row is not present in new data
From [lib://VCP qvd/vm.qvd](qvd)
Where not exists(orig.id, vm.id);
Join (VM_Target)
Load
vm.id,
vm.counter,
vm.VerifiedDate as vm.AddedDate, //if new row is not present in historic data, it's new. load and added dates match
vm.VerifiedDate,
vm.RemovedDate
Resident VM
Where not exists(orig, vm.id);
Drop field orig;
DROP TABLE VM;
RENAME TABLE VM_Target TO VM;
Store [VM] INTO [lib://VCP qvd/vm.qvd] (qvd);
I believe I have it figured out.
adding 'distinct' to each load helped, also, in the 3rd join statement, I changed
Join (VM_Target)
Load * //bring data in from qvd as-is if row is not present in new data
From [lib://VCP qvd/vm.qvd](qvd)
Where not exists(orig.id, vm.id);
to
Join (VM_Target)
Load distinct //bring data in from qvd as-is if row is not present in new data
vm.id,
vm.id as orig,
vm.counter,
vm.AddedDate,
vm.VerifiedDate,
as vm.RemovedDate
From [lib://VCP qvd/vm.qvd](qvd)
Where not exists(orig.id, vm.id);
explicitly adding the 'orig' field so that it joined as well for the next join statement to reference.
Also, in the 3rd load, I added the 'orig' field as well.
I believe I have it figured out.
adding 'distinct' to each load helped, also, in the 3rd join statement, I changed
Join (VM_Target)
Load * //bring data in from qvd as-is if row is not present in new data
From [lib://VCP qvd/vm.qvd](qvd)
Where not exists(orig.id, vm.id);
to
Join (VM_Target)
Load distinct //bring data in from qvd as-is if row is not present in new data
vm.id,
vm.id as orig,
vm.counter,
vm.AddedDate,
vm.VerifiedDate,
as vm.RemovedDate
From [lib://VCP qvd/vm.qvd](qvd)
Where not exists(orig.id, vm.id);
explicitly adding the 'orig' field so that it joined as well for the next join statement to reference.
Also, in the 3rd load, I added the 'orig' field as well.