Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
joey_lutes
Partner - Creator
Partner - Creator

Incremental Load Join Tables

I have some code that seems not to be behaving correctly and I'd love a 2nd opinion please.

This is an incremental load where during the load, associated dates are being set based on the load.  It works - kind of - and in most cases (this is one of various entity tables), but I'm finding discrepancies where the resulting load differs from the data being ingested.

The intended result is 1 table, with 1 row per unique entry, with accurate dates in the VerifiedDate, AddedDate and RemovedDate fields.

Thanks in advance!

//Load New Data

VM:

  LOAD

    vm.id,

    1 as vm.counter,

    '$(vLoad)' as vm.VerifiedDate,            //set VerifiedDate to the current load date (var set in load)

    date(null()) as vm.RemovedDate        //set RemovedDate to Null - initial load

FROM [lib://datafiles/vm\$(vFile)] ;

VM_Target:

    LOAD

        vm.id,

        vm.id as orig,

        vm.counter,

        vm.VerifiedDate as vm.AddedDate, //new row, create AddedDate and as current load

        vm.VerifiedDate,                         //Added and Verified Dates Match

        vm.RemovedDate                       //still Null()

From [lib://VCP qvd/vm.qvd](qvd)

  Where exists(vm.id, vm.id);

Join (VM_Target)

    Load

        vm.id,

        vm.counter,

        vm.AddedDate,

         vm.VerifiedDate,

        '$(vLoad)' as vm.RemovedDate    //if row is not present in new data and does not already contain a RemovedDate, update RemovedDate to current load

From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig, vm.id) and

    isnull(vm.RemovedDate)

Join (VM_Target)

    Load *                                              //bring data in from qvd as-is if row is not present in new data

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

Join (VM_Target)

  Load

        vm.id,

        vm.counter,

        vm.VerifiedDate as vm.AddedDate, //if new row is not present in historic data, it's new. load and added dates match

        vm.VerifiedDate,

        vm.RemovedDate

    Resident VM

    Where not exists(orig, vm.id);

    Drop field orig;

DROP TABLE VM;

RENAME TABLE VM_Target TO VM;

Store [VM] INTO [lib://VCP qvd/vm.qvd] (qvd);

1 Solution

Accepted Solutions
joey_lutes
Partner - Creator
Partner - Creator
Author

I believe I have it figured out.

adding 'distinct' to each load helped, also, in the 3rd join statement, I changed

Join (VM_Target)

    Load *                                              //bring data in from qvd as-is if row is not present in new data

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

to

Join (VM_Target)

    Load  distinct                                   //bring data in from qvd as-is if row is not present in new data

        vm.id,

        vm.id as orig,

        vm.counter,

        vm.AddedDate,

         vm.VerifiedDate,

        as vm.RemovedDate

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

explicitly adding the 'orig' field so that it joined as well for the next join statement to reference.

Also, in the 3rd load, I added the 'orig' field as well.

View solution in original post

1 Reply
joey_lutes
Partner - Creator
Partner - Creator
Author

I believe I have it figured out.

adding 'distinct' to each load helped, also, in the 3rd join statement, I changed

Join (VM_Target)

    Load *                                              //bring data in from qvd as-is if row is not present in new data

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

to

Join (VM_Target)

    Load  distinct                                   //bring data in from qvd as-is if row is not present in new data

        vm.id,

        vm.id as orig,

        vm.counter,

        vm.AddedDate,

         vm.VerifiedDate,

        as vm.RemovedDate

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

explicitly adding the 'orig' field so that it joined as well for the next join statement to reference.

Also, in the 3rd load, I added the 'orig' field as well.