Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
joey_lutes
Partner - Creator
Partner - Creator

Incremental Load Join Tables

I have some code that seems not to be behaving correctly and I'd love a 2nd opinion please.

This is an incremental load where during the load, associated dates are being set based on the load.  It works - kind of - and in most cases (this is one of various entity tables), but I'm finding discrepancies where the resulting load differs from the data being ingested.

The intended result is 1 table, with 1 row per unique entry, with accurate dates in the VerifiedDate, AddedDate and RemovedDate fields.

Thanks in advance!

//Load New Data

VM:

  LOAD

    vm.id,

    1 as vm.counter,

    '$(vLoad)' as vm.VerifiedDate,            //set VerifiedDate to the current load date (var set in load)

    date(null()) as vm.RemovedDate        //set RemovedDate to Null - initial load

FROM [lib://datafiles/vm\$(vFile)] ;

VM_Target:

    LOAD

        vm.id,

        vm.id as orig,

        vm.counter,

        vm.VerifiedDate as vm.AddedDate, //new row, create AddedDate and as current load

        vm.VerifiedDate,                         //Added and Verified Dates Match

        vm.RemovedDate                       //still Null()

From [lib://VCP qvd/vm.qvd](qvd)

  Where exists(vm.id, vm.id);

Join (VM_Target)

    Load

        vm.id,

        vm.counter,

        vm.AddedDate,

         vm.VerifiedDate,

        '$(vLoad)' as vm.RemovedDate    //if row is not present in new data and does not already contain a RemovedDate, update RemovedDate to current load

From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig, vm.id) and

    isnull(vm.RemovedDate)

Join (VM_Target)

    Load *                                              //bring data in from qvd as-is if row is not present in new data

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

Join (VM_Target)

  Load

        vm.id,

        vm.counter,

        vm.VerifiedDate as vm.AddedDate, //if new row is not present in historic data, it's new. load and added dates match

        vm.VerifiedDate,

        vm.RemovedDate

    Resident VM

    Where not exists(orig, vm.id);

    Drop field orig;

DROP TABLE VM;

RENAME TABLE VM_Target TO VM;

Store [VM] INTO [lib://VCP qvd/vm.qvd] (qvd);

1 Solution

Accepted Solutions
joey_lutes
Partner - Creator
Partner - Creator
Author

I believe I have it figured out.

adding 'distinct' to each load helped, also, in the 3rd join statement, I changed

Join (VM_Target)

    Load *                                              //bring data in from qvd as-is if row is not present in new data

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

to

Join (VM_Target)

    Load  distinct                                   //bring data in from qvd as-is if row is not present in new data

        vm.id,

        vm.id as orig,

        vm.counter,

        vm.AddedDate,

         vm.VerifiedDate,

        as vm.RemovedDate

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

explicitly adding the 'orig' field so that it joined as well for the next join statement to reference.

Also, in the 3rd load, I added the 'orig' field as well.

View solution in original post

1 Reply
joey_lutes
Partner - Creator
Partner - Creator
Author

I believe I have it figured out.

adding 'distinct' to each load helped, also, in the 3rd join statement, I changed

Join (VM_Target)

    Load *                                              //bring data in from qvd as-is if row is not present in new data

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

to

Join (VM_Target)

    Load  distinct                                   //bring data in from qvd as-is if row is not present in new data

        vm.id,

        vm.id as orig,

        vm.counter,

        vm.AddedDate,

         vm.VerifiedDate,

        as vm.RemovedDate

  From [lib://VCP qvd/vm.qvd](qvd)

  Where not exists(orig.id, vm.id);

explicitly adding the 'orig' field so that it joined as well for the next join statement to reference.

Also, in the 3rd load, I added the 'orig' field as well.