Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee
Employee

IntervalMatch

A common problem in business intelligence is when you want to link a number to a range. It could be that you have a date in one table and an interval – a “From” date and a “To” date – in another table, and you want to link the two tables. In SQL, you would probably join them using a BETWEEN clause in the comparison.

Events and Intervals with links.png

But how do you solve this in QlikView, where you should avoid joins?

The answer is to use IntervalMatch.

IntervalMatch is a prefix that can be put in front of either a Load or a SELECT statement. The Load/SELECT statement needs to contain two fields only: the “From” and the “To” fields defining the intervals. The IntervalMatch will generate all the combinations between the loaded intervals and a previously loaded numeric field.

Typically, you would first load the table with the individual numbers (The Events), then the table with the Intervals, and finally an intervalmatch that creates a third table that bridges the two first tables.

Events:
Load * From Events;

Intervals:
Load * From Intervals;

IntervalMatch:
IntervalMatch (Date)
Load distinct FromDate, ToDate resident Intervals;

Intervalmatch.png

The resulting data model contains three tables:

  1. The Events table that contains exactly one record per event.
  2. The Intervals table that contains exactly one record per interval.
  3. The IntervalMatch table that contains exactly one record per combination of event and interval, and that links the two previous tables.

Note that this means that an event may belong to several intervals, if the intervals are overlapping. And an interval can of course have several events belonging to it.

This data model is optimal, in the sense that it is normalized and compact. All QlikView calculations operating on these tables e.g. Count(EventID) will work and will be evaluated correctly. This means that it is not necessary to join the intervalmatch table onto one of the original tables. Joining it onto another table may even cause QlikView to calculate aggregations incorrectly, since the join can change the number of records in a table.

Further, the data model contains a composite key (the FromDate and ToDate fields) which will manifest itself as a QlikView synthetic key. But have no fear. This synthetic key should be there; not only is it correct, but it is also optimal given the data model. You do not need to remove it.

IntervalMatch can also be used with an additional key between the tables – i.e. when you have Slowly Changing Dimensions. But more about that in a later post.

HIC

For more on IntervalMatch and some script examples, see the technical brief IntervalMatch and Slowly Changing Dimensions.

40 Comments
avastani
Contributor II

Can also add additional dimension in cases where Employees change departments and you want to find the original person responsible for creating the record and/or sales etc. so that credit applies accordingly

I tend to LEFT JOIN the intervaltable into the fact table which removes the valid synthetic key just to keep things simple

0 Likes
15 Views
Employee
Employee

Your case is a a Slowly Changing Dimension. A record in the fact table has an EmployeeID and a Date. To find the department to which the employee belongs, you need both keys. An employee can (probably) only belong to one department at a time, and in such a case a Left Join can be used: QlikView will still calculate the aggregations correctly since the number of records in the fact table won't change.

However, a left join will add columns to the fact table - which will increase the memory usage. A solution without a left join could probably use less memory.

HIC

15 Views
avastani
Contributor II

Very true.

0 Likes
15 Views
trey_bayne
Contributor

Saying that a synthetic key is ok here is just like saying that allowing QV to auto-concatenate is ok. Most of the time, QV is doing exactly what it should. The problem is that QV  obscures what it does in synthetic keys. Doing a left join to the fact table guarantees what you are getting.

0 Likes
15 Views
Employee
Employee

I think it is quite the opposite. The Synthetic keys highlight that there is a composite key and that this needs special treatment: Only existing combinations should be used; NULL values should be handled; etc.

QlikView does the only possible from an algorithmic point:

  1. Lists relevant combinations (the $Syn table)
  2. Uses only the relevant combinations to link the two original tables (the $Syn key)

This is clearly visible when you look at the internal table view. You can even do a preview on the $Syn table. So, as I see it, nothing is obscured.

A left join seems like sweeping the dust under the rug. If you want to avoid synthetic keys properly, you should create your own composite keys instead.

Concerning the auto-concatenate: I agree that we probably shouldn't have done it this way when we first implemented it in 1994. But now it is there, and to remove it would cause greater problems.

HIC

15 Views
barryharmsen
Contributor II

Hi Henric,

I understand that you will want to keep the intermediate table in case of overlapping intervals, as that creates a many-to-many relationship. What I don't see why it should be kept when using slowly changing dimensions.

I often use the extended IntervalMatch to do an SCD2 lookup; replace the business key in the fact table (for example, the Employee ID from the source system) with the surrogate key used in the dimension table (a unique key that links to the different versions of the Employee record). This ensures that the fact table links to the correct version of the SCD.

Once that link has been created, all the extra stuff; the original Employee ID, Start date, End date, can all be dropped, they're no longer needed. The end result is that you've replaced one key with another key; so there's no extra data in the fact table.

What would be very useful is a variant of the IntervalMatch statement that returns the key, instead of the start- and enddates of the interval.

Kind regards,

Barry

15 Views
Employee
Employee

Keeping the intermediate table is a risk-free approach. But of course you can instead build a solution where you have the composite key directly in the fact table. The problem is only that when you join the intermediate table onto the fact table, you face two problems:

  • Maintaining the number of records in the fact table (a join can potentially change this)
  • Minimizing the number of columns (every additional column will use a considerable amount of RAM)

But if you know that you have these two problems under control, you can skip the intermediate table and do what you suggest.

HIC

15 Views
richard_chilver
Contributor III

Thanks for a concise and useful explanantion of how this works. Somewhat clearer than the official description. I have already got this working but it was tricky ! It may be worth noting that care has to be taken to ensure exactly one record per interval. I experienced problems with aggregations which seems to be where this wasn't the case.

0 Likes
15 Views
anantmaxx
Contributor III

Thanx Usefull!!

0 Likes
15 Views
Not applicable

Hello Friends,

This is very usefull.

But I have 4 columns instead of two columns. like

FromDate_1, ToDate_1, FromDate_2, ToDate_2

I want to compare Date_1 with FromDate_1, ToDate_1 and Date_2 with FromDate_2, ToDate_2 at same time.

Both date are in same row.

Can you please help me in this?

Thanks

15 Views
Employee
Employee

Yes, but... What do you want to do?

Let's say that you have a Date which has a match in Interval1, but not in Interval2. What should you then do? And the next Date has a match in Interval2 but not in Interval1. And the third date has a match in both. What then? Do you want to link all three dates the same way?

It sounds to me as if you may should consider two tables with intervals instead of one. Or a bridge table listing all intervals.

HIC

15 Views
Not applicable

Helllo Henric,

Thank you for reply.

Let me explain my case.

Example:

I have Two Table. Event and Interval as follow.

Events:
Load FromDate_1, ToDate_1, FromDate_2, ToDate_2, Point
From Events;

Intervals:
Load Date1, Date2
From Intervals;

Now i want to compare

Date1 with FromDate_1 and ToDate_1 and

Date2 with FromDate_2 and ToDate_2

It means

Date1 >= FromDate_1 and Date1 <= ToDate_1 and

Date2 >= FromDate_2 and Date2 <= ToDate_2

If this condition match I get Point.

Can i write statement like follow?

IntervalMatch:
IntervalMatch (Date1, Date2)
Load distinct FromDate_1, ToDate_1, FromDate_2, ToDate_2 resident Intervals;

Can you please help me in this?

Thanks.

0 Likes
15 Views
Employee
Employee

You cannot write the intervalmatch the way you suggest.

The following script should do it.

HIC

// =======================================

Intervals:

Load

          FromDate_1 & '|' & ToDate_1 & '|' & FromDate_2 & '|' & ToDate_2 as IntervalID,

          FromDate_1, ToDate_1, FromDate_2, ToDate_2, Point

From Intervals;

DatePairs:

Load

          Date1 & '|' & Date2 as DatePairID,

          Date1, Date2

From DatePairs;

tmpBridgeTable:

Load Date1, Date2

Resident DatePairs;

Left Join

IntervalMatch (Date1)

Load distinct FromDate_1, ToDate_1 Resident Intervals;

Left Join

IntervalMatch (Date2)

Load distinct FromDate_2, ToDate_2 Resident Intervals;

BridgeTable:

Load

          FromDate_1 & '|' & ToDate_1 & '|' & FromDate_2 & '|' & ToDate_2 as IntervalID,

          Date1 & '|' & Date2 as DatePairID

Resident tmpBridgeTable;

Drop Table tmpBridgeTable;

15 Views
vikasmahajan
Honored Contributor III

Sir,

Very Useful document...

Vikas

0 Likes
15 Views
aveeeeeee7en
Valued Contributor III

Very Useful.

Can you Please help me on this :

Re: Location Wise Different Holidays

0 Likes
15 Views
Not applicable

Henric, thanks for the example. Very helpful. I'm sure it is a "data thing" but when I look at my Table Viewer I see that QlikView has created a synthetic table between my IntervalMatch table and the table with the Interval data. In my case I am using a calendar table for the dates and an employee assignment table with FROM and TO dates. Is there any way to eliminate the synthetic table? Or do I just live with it as the results appear correct?!

Thanks

Kevin

0 Likes
15 Views
Employee
Employee

The synthetic key in a standard intervalmatch is not only harmless, but in fact quite efficient. So there is no reason to remove it really.

But if you still want to remove it, you should create a composite key like

     FROM & '|' & TO as Interval

both in the Intervals table and in the IntervalMatch table. To do that, you would need to run a second pass through the IntervalMatch table (creating a new table) and then drop the original one. See more in http://community.qlik.com/docs/DOC-4310

HIC

15 Views
Not applicable

Henric, I have a problem that is similar to what you were explaining in this post, but I could not solve it yet. I have an Excel file with range date that I need to transform into single rows with single dates and their respective affiliate and cost_per_registration .  The end_date 'if null' means today(2014-01-24), and country_name 'else' means the other countries.This is the table I have:

Table1.jpg

And, this is what I want to get:

Table2.jpg

It would be great if you could help me with this problem.

Thanks

0 Likes
15 Views
Employee
Employee

Although it is possible to use IntervalMatch for this, I would use a different approach: A while loop that generates all the dates in the range. See more on http://community.qlik.com/blogs/qlikviewdesignblog/2013/02/12/reference-dates

HIC

15 Views
Not applicable

Thank you! clear & complete!

0 Likes
15 Views
MCampestrini
Valued Contributor

As usual another great post.

Thanks HIC.

0 Likes
15 Views
Not applicable

great post.

I have created another document with examples and alternatives for interval match

intevalMatch.pdf

0 Likes
15 Views
Not applicable

@Henric..I have same kind of request can you help us?

I have Transaction data and Discount tiers. I want to match both and would like to have desired results as described below..

Thanks for your help.

Transaction Data:

C_YearC_QtrDateProductCustomer NumberQuantityMember Type
2014412/1/2014Apple1000140AAA
2014412/1/2014Orange1000155AAA
2014412/1/2014Banana1000175AAA
2014412/1/2014Apple1000260BBB
2014412/1/2014Orange1000263BBB
2014412/1/2014Banana1000244BBB
2014412/1/2014Apple1000349CCC

Discount Tiers:

C_YearC_QtrProductDiscount_TypeMember_TypeTierStartEndDiscount%
20144AppleGAAAG_Tier_00340
20144AppleGAAAG_Tier_13549.91
20144AppleGAAAG_Tier_25064.992
20144AppleGAAAG_Tier_3659999993
20144AppleGBBBG_Tier_13549.91
20144AppleGBBBG_Tier_25064.992
20144AppleGBBBG_Tier_3659999993
20144AppleGCCCG_Tier_110252
20144AppleGCCCG_Tier_226503

Desired output:

C_YearC_QtrProductCustomer NumberMember TypeQuantity (Sum of Quarter)Discount_TypeTierDiscount%
20144Apple10001AAA80GG_Tier_33
20144Orange10001AAA150GG_Tier_33
20144Banana10001AAA110GG_Tier_33
20144Apple10002BBB120GG_Tier_33
20144Orange10002BBB88GG_Tier_33
20144Banana10002BBB126GG_Tier_33
20144Apple10003CCC98GG_Tier_33

I placed complete file in my question. Please let me know if you need any additional info.

Need help in computing Discounts based on Min and Max limits (by Customer)

0 Likes
15 Views
sujeetsingh
Honored Contributor III

A best way to explain complex things

0 Likes
15 Views
Not applicable

Hi Henric,

I would like to generate a gantt chart or histogram "like" to be able to have visibility on the duration of certain events, but the only date I have is in the form of mm/dd/yyyy hh:mm:ss which correspond to event-start, example:

  • user  project  time                event
  • ----    -------    -----------------  ----------------
  • xyz  12345    01/01/01 10:10:10  project started
  • xyz  12345    01/01/01 11:11:11   project script ran
  • xyz  12345    03/01/01 01:01:01  project hold

the number of different events is around 27, but not all events are part of every project, which can have as many as needed, even if the event repeats a later time. How can I graphically represent the events per project number where I am able to see the time it took each event? Thanks in advance for all the help you can provide, i have been struggling with this representation for a long time now :-(

Carlos

BTW, I am using the Calendar Script that Rob Wunderlich wrote (Cookbook).

0 Likes
15 Views
Employee
Employee

If one event starts right after the previous one ends, then you can use the start time of the second event to define the end time of the first. See Creating a Date Interval from a Single Date

HIC

0 Likes
15 Views
Not applicable

Hi Henric,

Thank you for the prompt response to my questoin! I tried the method you suggested and it seems to work to create the dates interval, but I am more concern about how to get the time, we have many events during the same day that belong to the same project, for which we need to record and report.  Do you know how can I expand the Interval to use both variables (date & Time)? Thanks!!! Carlos

0 Likes
15 Views
Employee
Employee

If you have Date and Time as two separate (and correctly interpreted) fields, you can simply add them to get the correct timestamp. You probably want to wrap in a Timestamp function also, to get the correct formatting.

Hence:

Timestamp(Date + Time) as StartTime

If you want to combine this with creating the end times, I suggest something along the lines of

Load

   Timestamp(Date + Time) as StartTime,

   Timestamp(Peek(StartTime)-0.00000001) as EndTime,

   ...

   Resident Table

   Order By Date Desc, Time Desc;

HIC

0 Likes
15 Views
Not applicable

Hi Henric,

Once again, thanks for your fast response to my question!  I really appreciate it !!!

My StartTime field combines both date & time in the following format: "2/1/15 5:32 PM" - I tried to used the command "Peek(StartTime)-0.00000001) as EndTime," but i believe I'm not using it right because I get an error while trying to load the data.

The following error occurred:

ErrorSource: Microsoft Access Database Engine, ErrorMsg: Undefined function 'Peek' in expression.

Again, thank you for all the help you can provide me !

Carlos

0 Likes
15 Views
Employee
Employee

The error message shows that you have written the Peek() function inside the SELECT statement. This is not allowed. Use it in a Load statement instead. The structure may be

Load ... ;

SQL SELECT ... FROM ... ;

The Load statement is interpreted by QlikView, and the SELECT statement is interpreted by the ODBC driver (that knows nothing of Peek()). See Preceding Load.

Further, it sounds as if you perhaps havn't interpreted your Start time correctly. To do this, you need to use the TimeStamp#() function. See The Date Function and Get the Dates Right for an explanation of interpretation functions.

15 Views