Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have simplified my problem with this following example:
I have a Calendar (as above) and a file (called Usage here) that has records that contain a StartDate an EndDate and a SubscriberKey. The 2 tables need to be disjointed (remember this is a small part of a very large application).
I need to see a distribution (by Month-Year) of active subscribers ( i.e. count(distint (SubscriberKey))) - the way that I ascertain that a subscriber is active for a specific month is to see whether they have a StartDate that is either inside or before the month and an EndDate that is no less than the beginning of the month.
As an example, active subscribers for the month of "May 2014" are those that have a StartDate that is less than or equal 31/5/2014 and an EndDate that is greater or equal 1/5/2014. I have already ensured that the EndDate is greater than the StartDate.
I attach the 2 QVDs that make up the data set for the above.
This is not a classic intervalmatch scenario (I don't think) as I am not charting by the Start/EndDate intervals but merely using these fields to establish inclusion.
Any assistance would be much appreciated.
Alexis
In your data I would:
Matches:
IntervalMatch (Date)
LOAD StartDate, EndDate RESIDENT TestUsage;
Then you can create a chart with dimension=MonthYear and Expression= count(DISTINCT SubscriberKey).
With your sample data, you get a 48M row Matches table. This gives you a lot of flexibility, but depending on the size of your final data you may want to roll up that data into some monthly figures and drop the detail.
Example attached.
-Rob
Hi Alexis,
Can you give a try for the below:
In your calendar load part, add those 2 fields:
Floor(MonthStart(DateID)) as StartOfMonth,
Floor(MonthEnd(DateID)) as EndOfMonth
Then the expression would be:
count(distinct if(StartDateInt <= EndOfMonth and EndDateInt >= StartOfMonth, SubscriberKey))
Hope this gives you the desired results.
Thank you for that - that was very helpful.
There is in fact another criterion (the Status flag) which I have added and it seems to give the right results.
The formula now looks as follows:
= count(distinct if(StartDateInt <= EndOfMonth and EndDateInt >= StartOfMonth and Status = 3, SubscriberKey))
Comments on this approach:
a) Seems to be quite slow in recalculating when selections are made
b) Is there a SET ANALYSIS version of this that you can think of that might work. This would be useful for using the "1" and "$" notation to ignore selections etc..
I tried without success:the below:
= count(distinct {$<[StartDateInt] = {"<=[EndOfMonth]"}, [EndDateInt] = {">=[StartOfMonth]"}, [Status] = {3} >}
SubscriberKey)
Anybody spot anything wrong!
Alexis
"This is not a classic intervalmatch scenario (I don't think) as I am not charting by the Start/EndDate intervals but merely using these fields to establish inclusion."
In my mind, it is a classic IntervalMatch application. IntervalMatch will connect each Subscriber to Date rows in the Calendar table. Those are all the Dates the Subscriber was active. Then you can count(DISTINCT SubscriberKey) across any of the dimensions in the Calendar table.
For a similar example, see
Qlikview Cookbook: Expand a Pricing Date Table http://qlikviewcookbook.com/recipes/download-info/expand-a-pricing-date-table/
-Rob
Hi Rob
Thank you for taking the time to reply.I am a big fan of many of your postings and articles.
I am not sure how your example relates to the example that I posted - can you be so kind to look at my example and show me how to resolve my issue.
Instead of loading my calendar.qvd you can generate the calendar using the code as suggested ealrier on this thread:
Calendar:
Load
TempDate As DateID,
Date(TempDate) As Date,
Year(TempDate) As Year,
Month(TempDate) As Month,
Day(TempDate) As Day,
WeekDay(TempDate) As WeekDay,
Week(TempDate) As Week,
'Q' & Ceil(Month(TempDate)/3) As Quarter,
Date(MonthStart(TempDate), 'MMM-YYYY') As MonthYear,
Floor(MonthStart(TempDate)) As StartOfMonth,
Floor(MonthEnd(TempDate)) As EndOfMonth
;
Load
RecNo() -1 + $(vStartDate) As TempDate
AutoGenerate ($(vNumDays));
My chart is using the expression:
= count(distinct if(StartDateInt <= EndOfMonth and EndDateInt >= StartOfMonth and Status = 3, SubscriberKey))
Thanks
Alexis
Hi again,
Based on your suggestion it made me realise that the expression can be further simplified in order to enable you to report by ANY of the calendar dimensions not just Month-Year:
= count(distinct if(StartDateInt <= DateID and EndDateInt >= DateID and Status = 3, SubscriberKey))
The only problem is that it is very slow to calculate every time a selection is made or the level of granularity is changed (say from Month to week) - I have a feeling SET ANALYSIS will help. Can someone help redefine the above removing the "If" statement?
Thanks again
Alexis
In your data I would:
Matches:
IntervalMatch (Date)
LOAD StartDate, EndDate RESIDENT TestUsage;
Then you can create a chart with dimension=MonthYear and Expression= count(DISTINCT SubscriberKey).
With your sample data, you get a 48M row Matches table. This gives you a lot of flexibility, but depending on the size of your final data you may want to roll up that data into some monthly figures and drop the detail.
Example attached.
-Rob
Thanks Rob
This works perfectly although the presence of the synthetic keys is a little messy and off-putting.
Is there any way of removing the synthetic keys?
Thanks again for helping out
Alexis
Henric Cronström says this in his blog post IntervalMatch about the synthetic key that is created by the intervalmatch:
Further, the data model contains a composite key (the FromDate and ToDate fields) which will manifest itself as a QlikView synthetic key. But have no fear. This synthetic key should be there; not only is it correct, but it is also optimal given the data model. You do not need to remove it.
Thanks for the heads-up Gysbert - in an already "busy" data structure the fewer table the better so I was hoping to be able to remove the synthetic tables...
BR
Alexis