7 Replies Latest reply: Mar 4, 2015 11:14 AM by Michael Solomovich RSS

    INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier

    Daniel Wardzynski

      Dear ladies and gentlemen,

       

      I am having an issue with my incremental load script, and I am not 100% sure how I can solve this issue in Qlikview,

      My first file nov2014 includes data from 2013-01-01 and my second file dec2014 includes partly updated data since 2013-01-01 including an additional month of dec2014 that the previous file does not have.

       

      Some content from nov2014 file:

      Year     Month     Customer     Cost Amount

      2013     01          AB               100

      2013     01          CD               50

      2014     02          AB               90

      2014     02          CD               80

       

      UPDATED and NEW content from dec2014 file:

      Year     Month     Customer     Cost Amount

      2013     01          ABC             100          // Here we have a new customer name

      2013     01          CD               45            // Here we have a different cost amount

      2014     02          AB               90            // Same as in nov2014

      2014     02          CD               80            // Same as in nov2014

      2014     12          EF               70             // NEW rows

      2014     12          GH               60            // NEW rows

       

      Data file analysis:

      There are NO unique row identifier or primarykey. This is basically all we have.

       

      My question:

      How can I create an INCREMENTAL LOAD in Qlikview that UPDATES rows as well as ADD new rows into a .QVD file.

      I am not sure if this can be done, even if I use a composite key using [Year &'-'& Month as %YearMonthKey] when concatenating the 2 files(tables).

      Adding new additional rows works great using the WHERE NOT EXISTS (%YearMonthKey), but what about updating rows?

      We only have Year and Month columns that are constant, but since the Customer name can change as well, I can not use it in the Composite key.

       

      Do any of you have any good ideas or some advice on how I could do this?

       

      Maybe I could use a composite key that includes the Customer name: [Year &'-'& Month &'-'& Customer as %Key] and update the [Cost Amount] field first, and then use a different composite key to update the name of the customers. I am not sure!

      Is it even possible to do what I want?

       

      Appreciate all your help and advice!

       

      Sincerely,
      Daniel Wardzynski

        • Re: INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier
          Jonathan Dienst

          Hi

           

          The point of an incremental load is to retrieve only new and/or changed records from the database to Qlikview. Normally this is done with a monotonically increasing key or date, or a date created or last updated field in the database. Then a simple where clause in your SQL query will fetch only the relevant data from the database.

           

          If you need to create a key and use WHERE EXISTS in Qlikview, then you are first fetching all the data from the database and then filtering it in Qlikview, which kind of negates the benefits of an incremental load.

           

          So you will need to pass something into the SQL SELECT that allows filtering there.

           

          HTH

          Jonathan

          • Re: INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier
            Daniel Wardzynski

            Thank you for your reply Jonathan.

            [Where Not Exists()] is a vital part of an Incremental load (Insert and Update) scenario.

            Take a look at this thread: Incremental Load in QlikView – Part2 | Learn QlikView

             

            It seems that if I have 4 rows in my nov2014 file that share the same YEAR and MONTH as 2 rows in my dec2014 file. For instance:

            nov2014 file contains:

            Year     Month     Customer     Cost Amount

            2013     9             AB               100

            2013     9             AB               200

            2013     9             AB               300

             

            dec2014 file contains:

            Year     Month     Customer     Cost Amount

            2013     9             AB              999


            Then the outcome is only:

            Year     Month     Customer     Cost Amount

            2013     9             AB              999


            This happens when I am using WHERE NOT EXISTs (%YearMonthKey);

            // Year &'-'& Month &'-'& as %YearMonthKey


            So it basically deletes and replaces ALL OLD rows with the same year and month, with NEW rows with the same year and month, even if I have fewer rows. At the same time it adds all new rows for the new month: december 2014, that did not exist in the previous file: nov2014.


            Am I on the right track folks? Is this correct?


            In that case, I can use this kind of logic, and later figure out a better way to update the [Customer] names in order to concatenate different customer names into 1 name.


            • Re: INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier
              Michael Solomovich

              Hi Daniel,

               

              The problem is apparently in the fact that the customer name can be changed.  If not for your comment, it is impossible to tell that AB and ABC is the same customer.  But you should be able to know it somehow to add this comment.  Is there something like "customer  id" that never changes?  If yes, you could've mapped to it and use in the key.

               

              Regards,

              Michael

                • Re: INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier
                  Daniel Wardzynski

                  Hi Michael,

                   

                  Unfortunately there are no unique identifiers in these files, so no unique customer number. What I am thinking is creating a change-map that maps all new and changed Customer names that are connected to a particular customer name, that way we can maintain consistency with the customer names.

                  What do you think about this?

                   

                  What about my WHERE NOT EXISTs incremental load logics above. Is my thinking regarding the actual row updates correct that I wrote above?

                    • Re: INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier
                      Michael Solomovich

                      The change map sounds good, in fact you're making customer name a reliable identifier,  So, the key combining Year, Month, and Customer is a way to go.

                      Yes, WHERE NOT EXISTS is good to use here.  Typically I create the key in the first table only, e.g.

                       

                      Year & Month &Customer as %Key

                      And when loading the second table, use this:

                      WHERE NOT EXISTS (%Key, Year & Month &Customer)

                       

                        • Re: INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier
                          Daniel Wardzynski

                          Micheal,

                           

                          Thank you for your reply and clarification. With the code below I am basically replacing all the OLD rows that have the Year &'-'& Month as %Key, with NEW rows, even if there is only 1 new row, it completely replaces all OLD rows with the same %Key (deletes all the old rows and replaces with 1 new), and this is exactly what I want. Then it also adds(concatenates) NEW rows that DOES NOT exist (do not have the same %Key) since before.


                          Do you agree with me Micheal?

                          With the Customer map we discussed I should do it before I run the Incremental load, am I correct?

                           

                          This is my code in 2 parts, first the initial load and then the incremental load:

                          -----------------------------------------------

                          // PART 1

                          // Initial Load

                          Facttable:

                          LOAD

                            Year &'-'& Month as %Key,

                            Year,

                             Month,

                             Customer,

                              [Cost amount]

                          FROM ..2014nov.xlsx

                          STORE Facttable into Extract_Facts.qvd;

                          DROP Table facttable;

                           

                          // PART 2

                          // Incremental Load

                          Facttable:

                          LOAD

                            Year &'-'& Month as %Key,

                            Year,

                             Month,

                             Customer,

                              [Cost amount]

                          FROM ..2014dec.xlsx

                           

                          Concatenate

                          Facttable:

                          LOAD

                            %Key,

                            Year,

                             Month,

                             Customer,

                              [Cost amount]

                          FROM ..Extract_Facts.qvd (qvd)

                          WHERE NOT EXISTs (%Key);

                          STORE Facttable into Extract_Facts.qvd;

                            • Re: INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier
                              Michael Solomovich

                              Close...  Assuming you have QVD with the previous data, and a file with the additional/changed data, here is what I see:
                              ______________________________________________
                              CustomerChangeMap:
                              MAPPING LOAD DISTINCT
                              OldName,
                              NewName
                              FROM ..CustomerChangeMap.xlsx;

                               

                              //------------------------

                              // loading new data, in your case Dec2014
                              Facttable:
                              LOAD
                                autonumberhash128(Year & Month & Customer) as %Key,   // autonumber to keep key small
                                Year,
                                Month,
                                Customer,
                                [Cost amount]
                              FROM ..NewData.xlsx;

                               

                              // Loading QVD, converting Customer name; load will be non-optimized because of functions
                              PrevData:
                              LOAD
                                autonumberhash128(Year & Month & applymap('CustomerChangeMap',Customer)) as %QVDKey,
                                Year,
                                Month,
                                applymap('CustomerChangeMap',Customer) as Customer     // changing Customer name
                                [Cost amount]
                              FROM ..Extract_Facts.qvd (qvd);

                               

                              // adding QVD data to facts except if the key exists in new data
                              CONCATENATE (Facttable) LOAD
                                %QVDKey as %Key,
                                Year,
                                Month,
                                Customer,
                                [Cost amount]
                              RESIDENT PrevData
                              WHERE not exists(%Key, %QVDKey);

                               

                              DROP TABLE PrevData;

                               

                              STORE Facttable into Extract_Facts.qvd;

                              ____________________________________________

                               

                              Hope I didn't miss anything essential...