1 Reply Latest reply: Dec 14, 2016 5:56 PM by Joey Lutes RSS

    Incremental Load Join Tables

    Joey Lutes

      I have some code that seems not to be behaving correctly and I'd love a 2nd opinion please.

      This is an incremental load where during the load, associated dates are being set based on the load.  It works - kind of - and in most cases (this is one of various entity tables), but I'm finding discrepancies where the resulting load differs from the data being ingested.

       

      The intended result is 1 table, with 1 row per unique entry, with accurate dates in the VerifiedDate, AddedDate and RemovedDate fields.

       

      Thanks in advance!

       

      //Load New Data

      VM:

        LOAD

          vm.id,

          1 as vm.counter,

          '$(vLoad)' as vm.VerifiedDate,            //set VerifiedDate to the current load date (var set in load)

          date(null()) as vm.RemovedDate        //set RemovedDate to Null - initial load

      FROM [lib://datafiles/vm\$(vFile)] ;

       

      VM_Target:

          LOAD

              vm.id,

              vm.id as orig,

              vm.counter,

              vm.VerifiedDate as vm.AddedDate, //new row, create AddedDate and as current load

              vm.VerifiedDate,                         //Added and Verified Dates Match

              vm.RemovedDate                       //still Null()

      From [lib://VCP qvd/vm.qvd](qvd)

        Where exists(vm.id, vm.id);

       

      Join (VM_Target)

          Load

              vm.id,

              vm.counter,

              vm.AddedDate,

               vm.VerifiedDate,

              '$(vLoad)' as vm.RemovedDate    //if row is not present in new data and does not already contain a RemovedDate, update RemovedDate to current load

      From [lib://VCP qvd/vm.qvd](qvd)

        Where not exists(orig, vm.id) and

          isnull(vm.RemovedDate)

       

      Join (VM_Target)

          Load *                                              //bring data in from qvd as-is if row is not present in new data

        From [lib://VCP qvd/vm.qvd](qvd)

        Where not exists(orig.id, vm.id);

       

      Join (VM_Target)

        Load

              vm.id,

              vm.counter,

              vm.VerifiedDate as vm.AddedDate, //if new row is not present in historic data, it's new. load and added dates match

              vm.VerifiedDate,

              vm.RemovedDate

          Resident VM

          Where not exists(orig, vm.id);

          Drop field orig;

       

      DROP TABLE VM;

      RENAME TABLE VM_Target TO VM;

      Store [VM] INTO [lib://VCP qvd/vm.qvd] (qvd);

        • Re: Incremental Load Join Tables
          Joey Lutes

          I believe I have it figured out.

          adding 'distinct' to each load helped, also, in the 3rd join statement, I changed

          Join (VM_Target)

              Load *                                              //bring data in from qvd as-is if row is not present in new data

            From [lib://VCP qvd/vm.qvd](qvd)

            Where not exists(orig.id, vm.id);

           

          to

           

          Join (VM_Target)

              Load  distinct                                   //bring data in from qvd as-is if row is not present in new data

                  vm.id,

                  vm.id as orig,

                  vm.counter,

                  vm.AddedDate,

                   vm.VerifiedDate,

                  as vm.RemovedDate

            From [lib://VCP qvd/vm.qvd](qvd)

            Where not exists(orig.id, vm.id);

           

          explicitly adding the 'orig' field so that it joined as well for the next join statement to reference.

           

          Also, in the 3rd load, I added the 'orig' field as well.