Solved: Show 0 for Missing Dates - Qlik Community

johnw · ‎2011-04-06

We often need to create a line chart by date, and to have the line go to 0 when there is no data for that date. When tracking sales for instance, if you have sales on January 10 and January 12, but none on January 11, you do NOT want the line to go directly across the screen from January 10 to January 12. You want it to go down to 0 on January 11 to represent the fact that there were no sales on January 11.

I suspect that most people solve this problem by using a date island. The first chart in the attached file demonstrates that approach. However, the poor performance of this approach is often unacceptable.

The second chart solves the performance problem by using a continuous axis with a static min and max. As long as you have a January 11 date in the system (such as from a calendar table), the line will go to 0 on that date. However, I then lose control over the formatting. For example, I have cases where I want to split the day into three "Turns" (labeled A, B and C) of 8 hours each. The continuous axis will of course not display the turns. Or I may want to group dates by week. The continuous axis will continue to display legends for days that may or may not fall on my week boundaries. These display problems have caused a user complaint.

The third chart represents another failed attempt to get this to work. I used "Show All Values" to force the January 11 date to display. It does in fact display with the line going to 0 properly. However, the chart of course shows ALL dates, not just the range I've selected.

I need to come up with a solution to this problem that behaves properly (goes to 0, correct legends) and performs well. So far I have had no luck.

Has anyone else solved this problem in a high-performance way?

Note that I'm currently on version 9 SR6. If there's a version-10-only solution, I'd like to hear about it because we'll get there eventually. But if possible I'd like to solve it before then. Thanks!

Edit: Minor bug fix. Min and max on continuous axis were ignoring the DateTurn field because it is also on the RealData table. Had to exclude it from the concat explicitly, and thus INCLUDE it in the fields that the min and max are sensitive to.

Edit 2: I just tried bringing it up in version 10 instead of 9, and charts 2 and 3 aren't displaying lines, only points. That's bad. Chart 3 also seems to be ignoring "Show All Values" and is ONLY showing me points with data. That would also make it useless as a solution. These problems may also prevent most of you from being able to help, though if you can get it working in 10, hopefully it will also work in 9.

Edit 3: Changed attached file to add a Customer field and expand on the text explanation. Customer 'Carla' made all the purchases in the selected date range, so selecting her shouldn't affect the charts at all. It doesn't for the example charts, but this feature must be preserved in any solution.

Anonymous · ‎2011-04-07

John,
I'll be unable to give you a more detailed answer with an example before the weekend, sorry. The short answer is the appropriate data model with full Calendar which is not a data island. Works for me independently on QV version, starting QV 4.x.
(I uploaded an example for exactly this question from Mark long ago, maybe you can find it.)

View solution in original post

Anonymous · ‎2011-04-07

John,
I'll be unable to give you a more detailed answer with an example before the weekend, sorry. The short answer is the appropriate data model with full Calendar which is not a data island. Works for me independently on QV version, starting QV 4.x.
(I uploaded an example for exactly this question from Mark long ago, maybe you can find it.)

johnw · ‎2011-04-07

Well, I didn't find the example, but I think I've guessed what it would look like. To go to 0 using the data model, you have to generate all of the possible data points. Normally, when we're talking about date information, that simply means generating a separate calendar, which is something we normally want anyway.

But in MY case, I've also added the Item dimension to my graph. That means that to use that kind of data model solution, I must generate a table with all combinations of date and item. Well, in my case, DateTurn and item, since that's the lowest level in my calendar that I can display in the graph. This new data could be right joined back to the main table if I decide I don't want a synthetic key. I'm not thrilled with needing this extra data either way, but it doesn't seem like much of a practical issue, just an aesthetic one. I can definitely live with it.

The attached example demonstrates the technique. As far as I can tell, it's behaving exactly as I want, and works in both version 9 and version 10.

Thanks!

johnw · ‎2011-04-07

Well, I tried to apply that to a real application, and it failed badly. The problem happens when you make selections in OTHER fields. Select any real field, and all our new rows are excluded, because the other fields are null on those rows.

To demonstrate the problem, I added a Customer field to the attached file. Carla placed all of the orders in the January 10 through January 12 range. However, if you select her, all the 0 points go away. To preserve them, you'd need to create 0 points for Carla for those items on those dates. Similarly, you'd need to create 0 points for Bob on those items on those dates.

But the problem exists for EVERY other field. All of them connect only to real data. And so, to make sure that we get our 0 points no matter what the user selects, I believe you need to fill in the 0 points for every missing date for every row on your main table.

Now, let's say your main table has 10 million rows, and maybe one million combinations of values if you ignore the date field. If you have about three thousand dates, you need to add almost three BILLION rows to the table to get all of your 0 points no matter what the user selects.

That's obviously not practical, and in most cases is probably impossible.

So if I got that right, then the solution I attached is really no solution at all.

So I'm hoping your solution was something different than that, Michael. I'll keep looking for your post.

johnw · ‎2011-04-07

Ah! I think I found your post. You solve it with a date linkage table that connects the 'All' key to all dates. Then you duplicate all of the data in your real table with 'All' as the date key instead of the actual date, and nulling out the quantity fields you might be dealing with.

Looks like it works for my example data. See attached. I'm going to wait until I have it working in one of my real applications before calling it solved, but so far so good.

Edit: Link to the thread I think Michael was talking about - http://community.qlik.com/forums/p/17740/69387.aspx

johnw · ‎2011-04-07

Maybe I shouldn't keep posting every time I think of something or hit a snag, but it helps me to crystalize my thinking.

I'm running into troubles in the real application. The previous example (testShowMissing9) demonstrates one of the problems, and I didn't notice it when I posted. I had selected January 10 and January 12. Only Carla purchased items on those dates. The customer list box, however, shows both Bob and Carla in white. The reason, of course, is that ALL dates are now associated with ALL customers and items. So selecting dates no longer narrows down these list boxes. That significantly reduces the utility of QlikView's associative logic. In general, I'd like to be able to select a date, and see values that aren't in the REAL data for those dates get grayed out. In the real application, for instance, I have a rather long list of defects. They might select a period of only a few days, and perhaps we only had a few defects during those days. It is important that those few defects be sorted to the top of the list and be in white, while everything else is in gray. Otherwise, it makes things more difficult on the users.

I may have a fix (see attached file). I selected and locked every DateTurnKey EXCEPT for 'All'. That broke the chart but made everything else work properly. Then to make the chart work again, I just added DateTurnKey+={'All'} to my set analysis.

I may also want to abandon the continuous range idea. Why shouldn't my users be able to compare October 2006 to March 2011 without having to see every month in between? So the attached application now shows the discrete selections instead of the range. The continuous range expression is still there, just disabled. I also hate how complicated it is, so there's another point in favor of discrete values.

Edit: Modified example slightly. I decided it would be nicer to have a "ShowZeroPoints?" flag on the DateLink table. I lock it to 'N', then use set analysis to override to 'Y' for charts where I want to show the zero points. I didn't like the idea of selecting and locking potentially thousands of values. Selecting and locking a single value makes me more comfortable, even if it's functionally the same.

johnw · ‎2011-04-08

OK, I got it working in one of my real applications, and I'm not currently seeing any bugs or even potential user-satisfaction problems, so I'm calling it solved.

The real application was considerably more complex than the example, as is usually the case for such things. I had three date fields to apply the technique to. I had to move fields around on tables to prevent loops with the new data and avoid other problems. But in the end, it worked, and my charts look correct and have decent performance. Compression means that the application size has barely increased.

Something to watch out for when using this technique are any charts using {1} as the set, or something being modified from that. Those charts will of course ignore the locked setting for the new flag, causing them to render incorrectly. I had only one chart like that, and simply added the 'N' value for the flag to the set.

Here's a picture of the final data model with all three linkage tables and the flag, for the curious:

Anonymous · ‎2011-04-17

Sorry I'm a whole week too late (it's been really busy time). Anyway, looks like you found the solution.
I too was thinking recently about the data islands combined with set analysis, but it maybe not good for the front-end performance. Didn't have a chance to try yet.
As for the duplicated data for the Dates, I'm not much concerned given the number of rows in calendars are usually in hundreds, rarely in thousands.

johnw · ‎2011-04-18


Michael Solomovich wrote:I too was thinking recently about the data islands combined with set analysis, but it maybe not good for the front-end performance. Didn't have a chance to try yet.

Actually, now that you mention it, a data island combined with Gordon Savage's approach to evaluating sets in the context of a dimension...

http://community.qlik.com/wikis/qlikview-wiki/evaluating-sets-in-the-context-of-a-dimension.aspx

...just might be a high performance chart-only alternative. I doubt it would be AS fast as the data model solution, but I don't think it would be too far off the mark. It seems like it should work. Hopefully today I'll have time to make an example using that approach, particularly since the data model changes are looking difficult in some places. Not sure why I didn't even think about that approach since I spent days discussing it with Gordon. For me, it's been a solution in search of a problem. This might well be that problem.


Michael Solomovich wrote:As for the duplicated data for the Dates, I'm not much concerned given the number of rows in calendars are usually in hundreds, rarely in thousands.

Well, it's really the main table getting duplicated. I duplicated about one million rows. I thought QlikView's compression was on top of it at first, but looking more closely, it's using about 35% more memory. I really didn't expect that. Not that big a deal in this particular application, but the others are significantly larger, and I probably can't afford to increase their footprint by that much. And actually, in this application, it is NOT the main table I'm duplicating, just a table that has the key to the main table. Duplicating the main table might be even worse. All the more reason to fiddle with the other approach, though maybe I just did something wrong.

johnw · ‎2011-04-19

Well, I finished the example using a date island with conditional set analysis. It seems to work.

It is, however, ridiculously complicated. Probably as a result of that complexity, the calculation time is over half a second on my PC for even this trivial data set with 8 actual data points. The data model solution, by way of comparison, only took 0.016 seconds. Even a normal date island solution with a sum(if()) only took .031 seconds. So this appears to be the worst possible performance for this size of a data set.

My THEORY, though, is that the calculation time of the new approach corresponds to the number of DISTINCT dimension values. I suspect that at some point, with enough data, it would pull ahead of the sum(if()).

But even if it does, it's still ridiculously complicated, fairly slow, and very inflexible. You can't, say, break it down by customer instead of item without a serious overhaul to the script. And like any date island solution, it's confusing having the extra set of date fields, and they won't grey out other fields.

So I think it's a horrible solution. I'm posting it anyway for the curious. And below is the relevant part of the script if you don't want to open it to see what a mess it is:

// Create a date island

QUALIFY *;
[Island]: LOAD * RESIDENT [Calendar];
UNQUALIFY *;

// conditional set analysis

[Weeks]: LOAD DISTINCT Week RESIDENT [Calendar];
[Dates]: LOAD DISTINCT Date RESIDENT [Calendar];
[Turns]: LOAD DISTINCT Turn RESIDENT [Calendar];
[Items]:
LOAD text(fieldvalue('Item',iterno())) as Item
AUTOGENERATE 1
WHILE len(fieldvalue('Item',iterno()))
;

LET vExpression = 'pick(match(Item';

FOR I = 1 TO noofrows('Items')
LET vExpression = vExpression & ',' & chr(39) & fieldvalue('Item',$(I)) & chr(39);
NEXT

FOR I = 1 TO noofrows('Items')
LET vExpression = vExpression & ')
,pick(match(getcurrentfield(Island.Period)'
& ',' & chr(39) & 'Island.Week' & chr(39)
& ',' & chr(39) & 'Island.Date' & chr(39)
& ',' & chr(39) & 'Island.Turn' & chr(39) & ')
,pick(round((Island.Week-$(vMinDate))/7+1)
';
FOR W = 1 TO noofrows('Weeks')
LET vExpression = vExpression & ',vi' & $(I) & 'w' & $(W);
LET vi$(I)w$(W) = '=sum({<Item={' & chr(39) & fieldvalue('Item',$(I)) & chr(39) & '}'
& ',Week={' & chr(39) & fieldvalue('Island.Week',$(W)) & chr(39) & '}>} Sales)';
NEXT
LET vExpression = vExpression & ')
,pick(Island.Date-$(vMinDate)+1
';
FOR D = 1 TO noofrows('Dates')
LET vExpression = vExpression & ',vi' & $(I) & 'd' & $(D);
LET vi$(I)d$(D) = '=sum({<Item={' & chr(39) & fieldvalue('Item',$(I)) & chr(39) & '}'
& ',Date={' & chr(39) & fieldvalue('Island.Date',$(D)) & chr(39) & '}>} Sales)';
NEXT
LET vExpression = vExpression & ')
,pick(round((Island.Turn-$(vMinDate))*3+1)
';
FOR T = 1 TO noofrows('Turns')
LET vExpression = vExpression & ',vi' & $(I) & 't' & $(T);
LET vi$(I)t$(T) = '=sum({<Item={' & chr(39) & fieldvalue('Item',$(I)) & chr(39) & '}'
& ',Turn={' & chr(39) & fieldvalue('Island.Turn',$(T)) & chr(39) & '}>} Sales)';
NEXT
LET vExpression = vExpression & ')';
NEXT
LET vExpression = vExpression & '))';

DROP TABLES
[Weeks]
,[Dates]
,[Turns]
,[Items]
;