8 Replies Latest reply: Dec 12, 2011 10:26 AM by Jonathan Shaltz RSS

    How Can I Delete Data *After* LOADing?

    Jonathan Shaltz

      I have a fairly complicated script which loads a set of fact tables, then loads multiple copies of a dates table, renaming fields such that the key field in each Dates table maps to a field in the fact tables.  E.g., if my fact table has and order date and a ship date, I get two copies of Dates, Order Dates and Ship Dates, and the date key field is renamed to %OrderDateKey and %ShipDateKey, respectively (mad props to Witherspoon for most of the code).  This works, however I get every date in each copy of the dates table.  I'd like to delete records from each of the Dates tables if there are no matches to the fact table.

       

      If I was simply joining table A to table B, I'd use the INNER keyword, easy as that.  Unfortunately this is not an option due to the complexity of the process for loading and relabeling the dates data.  Ideally I'd run something like "DELETE FROM [Order date] WHERE %OrderKey IS NULL", however based on what I've seen, this does not seem to be an option.  An equivalent action in the GUI would be to select %OrderKey (or some other field which has a value for every record in the fact tables) and then use Reduce Data | Keep Possible Values.

       

      If that's unclear, here's a simplified outline:

       

      Fact Table

      %OrderKey%OrderDateKey
      1232011/09/01
      2342011/11/01
      3452011/12/01

       

      Order Dates Table

      %OrderDateKeyYear and quarter
      2011/09/012011.Q3
      2011/10/012011.Q4
      2011/11/012011.Q4
      2011/12/012011.Q4
      2012/01/012012.Q1

       

      A List Based on Order Date Looks Like This

      Year and quarter
      2011.Q3
      2011.Q4
      2012.Q1 - a future date; very confusing for order date

       

      I'd like to discard the last record from the Order Dates table, as there are no orders with an order date in 2012.  I don't really have a preference as to whether or not the date 2011/10/01, which also has no orders but is not a future date, stays or goes.

       

      I've read the doc pages on KEEP and EXISTS(), but they seem to be limited to modifying the LOAD command, and do not let you discard data after loading.  Am I missing a command which would achive this?  Something like "DELETE FROM [Order Dates] WHERE Date > Today()" would work, but only if I could apply it after the load, to specific tables, as some date tables should include future dates (e.g., expiration date).

       

      Thank you.

        • How Can I Delete Data *After* LOADing?
          Michael Solomovich

          Try using INNER JOIN

          • How Can I Delete Data *After* LOADing?
            jagan mohan rao appala

            Hi,

             

            Try filtering using where condition like below

             

            Test:

            LOAD

                *

               

            WHERE OrderKey <=Today();

            LOAD * INLINE [

                OrderKey, YearQuarter

                2011/09/01,    2011.Q3

            2011/10/01,    2011.Q4

            2011/11/01,    2011.Q4

            2011/12/01,    2011.Q4

            2012/01/01,    2012.Q1

            ];

             

            Hope this helps you.

             

            Regards,
            jagan.

            • Re: How Can I Delete Data *After* LOADing?
              Rob Wunderlich

              A KEEP should do what you want.

               

              LEFT KEEP([Fact Table]) LOAD %OrderDateKey RESIDENT [Order Dates Table];

               

              KEEP doesn't join tables. It only reconciles the two tables.

               

              -Rob

              • Re: How Can I Delete Data *After* LOADing?
                Jonathan Shaltz

                There are a number of ways to filter data while loading.  What I need to do is discard data that has already been loaded, because the filter will not apply until I've renamed fields, which I am doing after the table has been loaded.

                 

                I think Flipside's suggestion is close.  I believe you need the RESIDENT keyword, in this situation, since the goal is to clone a table in memory rather than load one from a file.  What I haven't worked out is a WHERE clause that specifies "the value in field X must appear in field Y in table Z."  This is what I have so far:

                 

                // Load the fact table, including the fields %OrderDateKey and %ShipDateKey
                
                // Load the dates table as [Order dates], renaming the PK field to %OrderDateKey
                
                // Create a new list of order dates which only incldues those that actually have orders:
                [Order dates (trimmed)]:
                NOCONCATENATE LOAD * RESIDENT [Order dates] WHERE EXISTS (%OrderDateKey);
                DROP TABLE [Order dates];  // Discard the old table
                
                // Load the dates table again, this time as [Ship dates], renaming the PK field to %ShipDateKey, and trim it as done with [Order dates]
                

                 

                However, the EXISTS does not do what I need for this problem, as all date key values do exist, in [Order dates].  So what I'd like to do is something like this:

                 

                NOCONCATENATE LOAD * RESIDENT [Order dates] WHERE EXISTS (%OrderDateKey IN Orders);
                

                 

                As a compromise, I could use "WHERE Date(SQLDate) <= Now()".  This would limit the dates table to past dates, but it would leave in any dates upon which no orders were placed.  This could be good or bad, but at least it's simple, easily explained to the consumers.  Using this workaround, I could more simply include the filter when loading the dates table each time, and would not need to LOAD RESIDENT.

                  • Re: How Can I Delete Data *After* LOADing?
                    Rob Wunderlich

                    I think you are overlooking the easy solution with KEEP. After your tables have been loaded, renamed, etc, the complete code to then remove the extra dates is:

                     

                    [Order Dates Table2]:

                    LEFT KEEP([Fact Table]) LOAD * RESIDENT [Order Dates Table];

                    DROP TABLE [Order Dates Table];

                     

                    Am I misunderstanding the requirement?

                     

                    -Rob

                      • Re: How Can I Delete Data *After* LOADing?
                        Jonathan Shaltz

                        My impression was that KEEP would not work, because it can only be used while loading the data for the first time.  After all, if I LOAD X from X.qvd, and then KEEP LOAD X2 from RESIDENT X, every record in X2 will be kept, because every record in X2 has a matching record in X.  It has to, since it's a clone of X.

                         

                        However, your sample code suggests that you can specify a second table, presumably meaning "keep all records in X where there's a natural match to Y."  That would certainly change things.  And it works!  Thank you!

                         

                        I think I'll stop bothering with checking the QlikView built-in help, it leaves out so much it's less than useless.  The doc page on Keep doesn't mention this capability, or give any code examples.  It is described under the "Left" page, which is hardly the keyword I'd use to look up how to delete records.