Re: Best Practices Re: a Calendar Table and Multip... - Page 2 - Qlik Community

Report Inappropriate Content · ‎2011-11-28

Howdy. I've searched through the forum, but not found a post which quite addresses this question. This one was close, but represented the opposite problem. The "similar posts" tool found more results han my searching: this post, this one, and this one cover some solutions, but don't discuss which is best, or more generally what advantages and disadvantages exist. What's easiest? What's most commonly used? Has anyone used one of these solutions for 12 months or so and has some feedback on how well it's worked?

A common practice in data warehouses is to have a Dates table which gives values such as year, quarter, and month for each day. It looks like this is common in QlikView too, and the name "calendar table" seems to be widely-used. If this calendar table has a Date field as its key, any queries which also contain a Date field will automagically join to it. Groovy!

But... what do you do when you have multiple fact tables, with different date fields, which should not join together? For example, say there's a Customers.Date field representing the date the customer was created, and an Orders.Date field representing the date an order was placed. A report which contains both fields will need to rename at least one of them, breaking the join.

Also, a fact table may contain multiple date dimensions. In my case, the Orders table has OrderDate, DatePaid, and ShipDate fields. I could create three more calendar tables, with key fields of OrderDate, DatePaid, and ShipDate, respectively. This seems cumbersome; is this the best practice, or is there a simpler option?

In SQL, I'd write three JOINs, to the same table each time but specifying a different field from the first table each time. But QlikView seems to be restricted to natural joins, which as a DBA makes me wince.

Instead of using one or more calendar tables, you could just import the date values as dates, and use QlikView date functions to populate your date hierarchies. Is this what most users end up using?

I appreciate any input. Thank you!

Report Inappropriate Content · ‎2011-12-01

I'm not sure I'm following you. If we're agreed that multiple Date.qvd files is not optimal for this scenario, then it follows that we'll have just one, right? Are you suggesting that there be multiple QVDs, generated automatically with the various prefixes? That would eliminate the need to have code for auto-generating Dates tables in QVW files, but would instead entail multiple joins, once for each date field used in the QVW. My Dates.qvd file is 146 KB, for the 7,600-odd days from 2000/01/01 to 2020/12/31, so the space required for multiple files shouldn't have an impact, except for slow connections like VPNs or some home DSLs.

Or are you suggesting putting all the date fields in each fact table's QVD? That was the approach I started with, but it definitely bloats things to add 20 date fields for each of four date fields in a fact table x 10 million Orders rows. Cutting this out has helped me a lot; my BI server is out-of-state and the bandwidth is not what it could be. So that's a third option, though it entails larger fact tables.

The script-based date table multiplication pattern which your code implements has the advantage of making all date fields available, all the time. And it requires a minimal amount of disk space. I'm having trouble with one line, but I'll post it here when I have it pinned down.

johnw · ‎2011-12-01

Right, I was "suggesting putting all the date fields in each fact table's QVD". All fields are then available in one QVD, and the user applications load the ones they need (or load * if you like). I understand your point about QVD bloat, though, and it's certainly one reason I hesitate doing this myself.

If you can nail down the script code, you can include it rather than cutting and pasting. So then user applications would load all their data, and then just have a little include to build all the associated calendars. But then the USER applications have the bloat, because that's almost certainly more fields than they really need. It probably isn't that big a deal in practice, though. How many rows X columns can you have from a few calendar tables? Probably much less data there than your fact tables most of the time.

Report Inappropriate Content · ‎2011-12-02

Gotcha.

Regarding the bloat, I'd argue that it goes the other way. With the multiple-calendar-files approach or the one-file-loaded-multiple-times approach, each QVW loads the whole dates QVD once per date field among the tables loaded. Assuming ten date keys, that's about 1.5 MB: 145 KB for Order Date, 145 KB for Date Paid, 145 KB for Ship Date, etc. I'm using my Dates.qvd file for reference here; it works out to 19 bytes per daily record, 145 KB total / 7,761 days from 2000/01/01 to 2020/12/31.

However, putting these date fields in the fact tables' QVDs means adding those 19 bytes once for each date field in the fact table, for each row in the fact table. If our main fact table has six date keys and one million rows, that's 19 bytes x 6 x 1.0E6 = 109 MB. And if we have ten million rows in our fact table, we'll hit the 1 GB mark. I had to give up on this approach when I first took a stab at translating my data warehouse to QlikView, because the load times were just too painful.

If your calendar file is narrower, this would be less of an issue. However, most of my 22 date fields are Yes/No, which are probably being compressed to bits. So even if you just have the usual suspects like Year, Month, Week, Day, and Day of week, I bet that would use around three-quarters as much space.

If load times are the top concern, I guess you'd load a single integer date and use date functions to expand that in the QVW? Your consumers would need to be comfortable with some fairly complex expressions.

Report Inappropriate Content · ‎2011-12-02

It's things like this that drive me crazy trying to work with QlikView. I've spent hours going over my script, trying to see why it wasn't working, and then today I see a forum post. It turns out that my script is correct, but there's a bug/undocumented limitation such that you're not allowed to rename a field once it's been used as a key (but you can't force the system to NOT use a field as key!). I'd be OK with this if QlikView gave me a little message "you can't rename keys, bub," but instead it SILENTLY SWALLOWS THE ERROR! No warning! No message! It just ignores the RENAME FIELD command. What sadist thought that was a good idea?!

johnw · ‎2011-12-02

JonathanShaltz wrote:
It's things like this that drive me crazy trying to work with QlikView. I've spent hours going over my script, trying to see why it wasn't working, and then today I see a forum post. It turns out that my script is correct, but there's a bug/undocumented limitation such that you're not allowed to rename a field once it's been used as a key (but you can't force the system to NOT use a field as key!). I'd be OK with this if QlikView gave me a little message "you can't rename keys, bub," but instead it SILENTLY SWALLOWS THE ERROR! No warning! No message! It just ignores the RENAME FIELD command. What sadist thought that was a good idea?!

I entirely agree with you. First, it's ridiculous that we can't rename to a field that already exists. I'm sure they're just trying to protect us from ourselves, but to me, it's an obvious and straightforward way to create associations. Why shouldn't I be able to load a table, manipulate it all I want, and THEN rename a field to create an association? Not allowed. And second, yes, it's even more ridiculous that it returns no error, and simply ignores the rename. QlikView's error reporting sometimes leaves a lot to be desired. Third, I almost guarantee that if you sent this to QlikTech as a bug, they'd say "Working as Designed" and close your ticket with no further action. You would have to submit it as a feature request. I've pretty much stopped reporting bugs after getting that response a few times. I like QlikView and QlikTech in general, but there are definite areas for improvement, areas for improvement that seem rather surprising to me for a mature product and mature company.

Anyway, that's one reason why my script was as tortured as it was - I had to make sure the date field itself got the right name up front rather than fixing it after the fact in the main rename loop. I perhaps should have mentioned that, as fixing that "mistake" seems an obvious thing to do when presented with my script.

johnw · ‎2011-12-02

I'm unclear if we're in agreement or not on the bloat. While there are advantages to just putting all the date fields in the main fact table QVD, I agree that adding 100 MB or 1 GB to our main fact table was bloat. That was in the first paragraph.

What I meant by user application bloat (second paragraph) was that if we loaded ALL calendar fields for ALL dates in a user application, this was likely to be a lot more fields than the user application needed. User applications would ideally only load fields they need. So whether we use a calendar script, or LOAD * from OrderDateCalendar.qvd (QVD), we'd be causing user application bloat. But also as I said, "It probably isn't that big a deal in practice". As long as they're on separate tables, it won't take much space. It's really only a problem if they're then joined with the main fact table.

Date are already stored as integers, as the number of days since December 30, 1899. And as long as you have separate calendar tables, they won't take up much space. This is also what you'd do if load times are the top concern. In that case, you wouldn't put all the date fields in the fact table. You'd either have one calendar QVD and rename as you read in, or multiple calendar QVDs. You would not join these to the fact table, but just let the association by field name take care of it.

So now I'm thinking your best answer is to have one calendar QVD, nail down the script code, and use it as an include to automatically generate calendars for every date field in the user application. You get more fields in user applications than you want, but they're on small tables, so it won't affect overall size or performance very much. You get to keep your fact tables as small as possible. Performance, file size, and RAM used should all be pretty good. You maintain your calendar fields in one place. You maintain your calendar script in one place.

I need to look up how to include script, though. I've never done it. Looks like it would be like this?

$(Include=GenerateCalendars.qvs)

I need to fiddle with this a little, because this is starting to sound like something I should consider doing myself.

Wait... we may be reinventing the wheel, or at least simultaneously inventing the wheel. I bet there's a calendar in Rob Wunderlich's QlikView Components open source project (now in beta). I'm off to have a look...

http://code.google.com/p/qlikview-components/

Report Inappropriate Content · ‎2011-12-02

OK, I'm glad to hear it's not just me. Part of why it's frustrating is that there's so much good stuff in QlikView, if they'd smooth out some of these rough spots (instead of chasing fads with mobile BI, hmm?) they could have such a kick-ass product.

Here's what worked. I'd thought I could simply rename the key field rather than duplicating it and dropping it, but since that's not possible (or documented!), my code is almost exactly the same as yours, above. I added a lookup table, but mostly I've just commented every other line.

SET HidePrefix = '%';
Orders:
FIRST 100000 LOAD * FROM "Raw Orders.qvd" (qvd);

Customers:JOIN FIRST 100000 LOAD * FROM "Raw Customers.qvd" (qvd);
DatePrefixes:
LOAD * INLINE
[
%KeyFieldName, %Prefix
%CancellationDateKey, Cancellation
%ConversionDateKey, Conversion
%ExpirationDateKey, Expiration
%OrderDateKey, Order
%DatePaidKey, Paid
%HistoryDateKey, History
%SetUpDateKey, Set up
%ShipDateKey, Shipped
];
FOR _tableNum = 1 TO NoOfTables()  // For each table in our file...
    LET _tableName = TableName(_tableNum - 1);
    
    FOR _fieldNum = 1 TO NoOfFields('$(_tableName)')  // For each field in this table...
        LET _dateFieldName = FieldName(_fieldNum, '$(_tableName)');
        
        IF WildMatch('$(_dateFieldName)', '%*Date*Key') THEN  // This field indicates a date key; import the Dates table, with a prefix to maintain uniqueness
            // The name of the master table's key field determines the human-readable prefix
            LET _prefix = Lookup('%Prefix', '%KeyFieldName', _dateFieldName, 'DatePrefixes');
            
            // Load the Dates file, with the key aliased to match the field in the master table to which we're going to join
            LET _dateTableName = _prefix & ' date';
            [$(_dateTableName)]:
            LOAD *, "%DateKey" AS "$(_dateFieldName)" FROM "Raw Dates.qvd" (QVD);
            
            // Discard the old key field; note that we can't simply rename it, we have to duplicate it and then drop the old field, because QlikView does not permit renaming keys
            DROP FIELD "%DateKey";
            
            // Rename every field (except the key) in the newly-loaded dates table
            FOR _dateFieldNum = 1 TO NoOfFields('$(_dateTableName)')
                LET _oldDateFieldName = FieldName(_dateFieldNum, '$(_dateTableName)');
                IF _oldDateFieldName <> _dateFieldName THEN
                    LET _newDateFieldName = _prefix & ' ' & _oldDateFieldName;
                    RENAME FIELD "$(_oldDateFieldName)" TO "$(_newDateFieldName)";
                ENDIF  // If this is not the key
            NEXT  // Next date table field
        ENDIF  // If the field name indicates a date key
    NEXT  // Next field in the master table
NEXT  // Next master table

I wouldn't call it tortured, but yeah now I see why you had what seemed like an odd construct in your original script. Despite this painful detour, your code got me exactly the solution I wanted, so thank you.

johnw · ‎2011-12-02

John Witherspoon wrote:
Wait... we may be reinventing the wheel, or at least simultaneously inventing the wheel. I bet there's a calendar in Rob Wunderlich's QlikView Components open source project (now in beta). I'm off to have a look...
http://code.google.com/p/qlikview-components/

OK, Rob's QlikView Components project does indeed include calendar script. You call a script function with a min date, max date, table name, and field prefix. You can establish the min and max date automatically from your data using another call. So I think it looks like this:

CALL Qvc.GetFieldValues('vDate','Order Date');
CALL Qvc.Calendar(vDate.Min,vDate.Max,'Order Date Calendar','Order');

Then repeat for each date field. And I suspect we could wrap it in our loop that detects date fields in our model if desired.

What seems to be a significant limitation is that it is autogenerating the calendar in the included script, and of course this included script is part of the open source project. It's just a script file, so you can override it to put in all your own field names and definitions, but I think you'd have to store that off somewhere and copy it back in every time you updated the project, while making sure you weren't breaking anything.

I definitely need to look into it more carefully.

Report Inappropriate Content · ‎2011-12-02

Agreed on all that re: bloat.

Interesting code from Wunderlich. Too bad I didn't find it in my Googling. I've had a hard time finding answers to QlikView questions via plain ol' web search, maybe I'm just not using the right key words.

Your syntax is correct for the include command. It gave me some trouble until I discovered that, unlike the SET and LET commands, and unlike what the help file shows, ($Include) will break if you put a space before the equals sign. I suspect it's a conspiracy to make me lose the rest of my hair. But it works, so I'm happy! Now I'll go do something fun, like wrestle a rabid crocodile in a phone booth.

johnpaul · ‎2012-01-11

Am I missing something or do we need to do this simply because QV isn't very friendly with the way it handles dates?

All this for only a drill up drill down?

I mean the effort required to create a separate calendar for each date field if you use Rob's script or John's isn't too much, but essentially all you are creating is a series of calendars which allow you to drill up and down date ranges.

Should be easier

If you look at the way Tableau deals with dates, it seems simpler. What happens is that Tableau sees a date field and automatically makes it a drill up and down field. So you drop in a date, say Order Date and then you automatically get Order Day / Week / Month / Qtr / Year. Now if you have different Fiscal years, this might need some tweeking.

Each Calendar could be different

I would agree that creating a new calendar for each field may be useful. In the example of Ordered / Shipped / Invoiced, it might be the case that different periods are used for these business processes. This could easily be the case in a large multi-national environment where invoice date conforms to a Fiscal period at head office in the USA and the order date falls into a different period, so you may have the date fields linking to a totally separate calendar table.

A final question - do you stick with Set Analysis or Mix it Up?

Let's say I created a document which tracks Orders / Shipped / Invoiced. That lends itself nicely to a central concatenated FACT table as the columns for each process are similar. In this case, we can also use a central DateKey which would correspond to the transaction date for each process.

In this case, you can easily create a few listboxes linking to the central calendar for Year/Month/Day. When a user clicks on a day, you will get all the Orders, Shipments and Invoices for that day. Remember, we did a concatenation and set the DateKey to be the transaction date for each transaction type.

However, back to John's comment in a previous part of the thread:

Do your users want to be able to select December 1, 2011, and see all orders placed OR shipped on that date? Then create a single generic calendar linked by date type.
Do your users want to be able to see orders that were both placed on November 15, 2011 and then shipped on November 20, 2011? Then create separate calendars.

The user may be wanting to see all orders for one day , then the corresponding shipments and invoices. Set analysis will work best for that.

Personally, I am leaning towards using set analysis everywhere, but it does make things more complex mainly, I think for the reason that you have to anticipate what the user is going to do much more.

Best Practices Re: a Calendar Table and Multiple Date Fields