Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How Can I Delete Data *After* LOADing?

I have a fairly complicated script which loads a set of fact tables, then loads multiple copies of a dates table, renaming fields such that the key field in each Dates table maps to a field in the fact tables.  E.g., if my fact table has and order date and a ship date, I get two copies of Dates, Order Dates and Ship Dates, and the date key field is renamed to %OrderDateKey and %ShipDateKey, respectively (mad props to Witherspoon for most of the code).  This works, however I get every date in each copy of the dates table.  I'd like to delete records from each of the Dates tables if there are no matches to the fact table.

If I was simply joining table A to table B, I'd use the INNER keyword, easy as that.  Unfortunately this is not an option due to the complexity of the process for loading and relabeling the dates data.  Ideally I'd run something like "DELETE FROM [Order date] WHERE %OrderKey IS NULL", however based on what I've seen, this does not seem to be an option.  An equivalent action in the GUI would be to select %OrderKey (or some other field which has a value for every record in the fact tables) and then use Reduce Data | Keep Possible Values.

If that's unclear, here's a simplified outline:

Fact Table

%OrderKey%OrderDateKey
1232011/09/01
2342011/11/01
3452011/12/01

Order Dates Table

%OrderDateKeyYear and quarter
2011/09/012011.Q3
2011/10/012011.Q4
2011/11/012011.Q4
2011/12/012011.Q4
2012/01/012012.Q1

A List Based on Order Date Looks Like This

Year and quarter
2011.Q3
2011.Q4
2012.Q1 - a future date; very confusing for order date

I'd like to discard the last record from the Order Dates table, as there are no orders with an order date in 2012.  I don't really have a preference as to whether or not the date 2011/10/01, which also has no orders but is not a future date, stays or goes.

I've read the doc pages on KEEP and EXISTS(), but they seem to be limited to modifying the LOAD command, and do not let you discard data after loading.  Am I missing a command which would achive this?  Something like "DELETE FROM [Order Dates] WHERE Date > Today()" would work, but only if I could apply it after the load, to specific tables, as some date tables should include future dates (e.g., expiration date).

Thank you.

1 Solution

Accepted Solutions
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I think you are overlooking the easy solution with KEEP. After your tables have been loaded, renamed, etc, the complete code to then remove the extra dates is:

[Order Dates Table2]:

LEFT KEEP([Fact Table]) LOAD * RESIDENT [Order Dates Table];

DROP TABLE [Order Dates Table];

Am I misunderstanding the requirement?

-Rob

View solution in original post

8 Replies
Anonymous
Not applicable
Author

Try using INNER JOIN

Not applicable
Author

Yes, as mentioned in the second paragraph, I could use that if I was simply joining one table to another.  However, I am building the second table in the script, and must rename its fields before the correct join logic exists, so the deletion has to happen after the load.

flipside
Partner - Specialist II
Partner - Specialist II

Something like this will do it ...

     tmpTableA:

     Noconcatenate Load * from TableA where {condition...};

     DROP TABLE TableA;

     RENAME TABLE tmpTableA to TableA;

jagan
Luminary Alumni
Luminary Alumni

Hi,

Try filtering using where condition like below

Test:

LOAD

    *

   

WHERE OrderKey <=Today();

LOAD * INLINE [

    OrderKey, YearQuarter

    2011/09/01,    2011.Q3

2011/10/01,    2011.Q4

2011/11/01,    2011.Q4

2011/12/01,    2011.Q4

2012/01/01,    2012.Q1

];

Hope this helps you.

Regards,
jagan.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

A KEEP should do what you want.

LEFT KEEP([Fact Table]) LOAD %OrderDateKey RESIDENT [Order Dates Table];

KEEP doesn't join tables. It only reconciles the two tables.

-Rob

Not applicable
Author

There are a number of ways to filter data while loading.  What I need to do is discard data that has already been loaded, because the filter will not apply until I've renamed fields, which I am doing after the table has been loaded.

I think Flipside's suggestion is close.  I believe you need the RESIDENT keyword, in this situation, since the goal is to clone a table in memory rather than load one from a file.  What I haven't worked out is a WHERE clause that specifies "the value in field X must appear in field Y in table Z."  This is what I have so far:

// Load the fact table, including the fields %OrderDateKey and %ShipDateKey

// Load the dates table as [Order dates], renaming the PK field to %OrderDateKey

// Create a new list of order dates which only incldues those that actually have orders:

[Order dates (trimmed)]:

NOCONCATENATE LOAD * RESIDENT [Order dates] WHERE EXISTS (%OrderDateKey);

DROP TABLE [Order dates];  // Discard the old table

// Load the dates table again, this time as [Ship dates], renaming the PK field to %ShipDateKey, and trim it as done with [Order dates]

However, the EXISTS does not do what I need for this problem, as all date key values do exist, in [Order dates].  So what I'd like to do is something like this:

NOCONCATENATE LOAD * RESIDENT [Order dates] WHERE EXISTS (%OrderDateKey IN Orders);

As a compromise, I could use "WHERE Date(SQLDate) <= Now()".  This would limit the dates table to past dates, but it would leave in any dates upon which no orders were placed.  This could be good or bad, but at least it's simple, easily explained to the consumers.  Using this workaround, I could more simply include the filter when loading the dates table each time, and would not need to LOAD RESIDENT.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I think you are overlooking the easy solution with KEEP. After your tables have been loaded, renamed, etc, the complete code to then remove the extra dates is:

[Order Dates Table2]:

LEFT KEEP([Fact Table]) LOAD * RESIDENT [Order Dates Table];

DROP TABLE [Order Dates Table];

Am I misunderstanding the requirement?

-Rob

Not applicable
Author

My impression was that KEEP would not work, because it can only be used while loading the data for the first time.  After all, if I LOAD X from X.qvd, and then KEEP LOAD X2 from RESIDENT X, every record in X2 will be kept, because every record in X2 has a matching record in X.  It has to, since it's a clone of X.

However, your sample code suggests that you can specify a second table, presumably meaning "keep all records in X where there's a natural match to Y."  That would certainly change things.  And it works!  Thank you!

I think I'll stop bothering with checking the QlikView built-in help, it leaves out so much it's less than useless.  The doc page on Keep doesn't mention this capability, or give any code examples.  It is described under the "Left" page, which is hardly the keyword I'd use to look up how to delete records.