Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Sum of date ranges without overlap

Hi all,

I'm totally new to this community, so please bear with me.

As the end user of a report in the making, I work with a skilled consultant on getting everything right.

The only big obstacle so far, is the root of the accuracy of my report. The data set looks something like this:

IdentifierStartTimeEndTime
1.04.00106.11.2013 16:16:5206.11.2013 16:18:43
1.04.00106.11.2013 16:17:1106.11.2013 16:18:43
1.04.00106.11.2013 17:18:3806.11.2013 17:19:03
1.04.00106.11.2013 17:39:0506.11.2013 17:39:48
1.04.00106.11.2013 17:39:2806.11.2013 17:39:48
1.04.00107.11.2013 07:43:5507.11.2013 07:44:04
1.04.00107.11.2013 08:19:3607.11.2013 08:22:27

The desired outcome is something like Sum(EndTime-StartTime), but without any overlap!


If I plainly calculate the duration of each date range and add them up, I'll get 0:07:51. But, since two of these records have ranges that do overlap, this will pollute my statistics. Therefore, any overlapping should not be taken into account, and then the result should be 0:05:59, which is quite a difference.


Does anybody have some suggestions I could bring forward to my consultant?


My apologies if I explained this poorly!


Best regards,

Knut

1 Solution

Accepted Solutions
teempi
Partner - Creator II
Partner - Creator II

Ohh that's right, didn't think of that. You could add a new field for the "EndTime" as well. So let's assume "NewEndTime". You must then create a similar logic for this new field:

If the "EndTime" for the current row is less than the "NewStartTime" for the current row, use "NewStartTime" as the value for "NewEndTime" (this basically means the interval for this row becomes 0). Otherwise use "EndTime" directly. I think you must also change the logic for "NewStartTime" so that it compares the original "StartTime" to the "NewEndTime" instead of "EndTime".

I'm sorry if this is getting a bit confusing, I'm just typing out as I think. But I'm pretty sure that by using new fields for the times you can get the calculations working correctly. Some polishing to my logic might be required though.

-Teemu

View solution in original post

6 Replies
teempi
Partner - Creator II
Partner - Creator II

Hi Knut,

Just wrote you a long reply but for some reason it vanished so here we go again

You could add a new field to the data in your script. Let's call it "NewStartTime". Populate this field so that if the "EndTime" on the PREVIOUS row is greater than the "StartTime" on the current row, use EndTime of the last row as the value for "NewStartTime". Otherwise you can use "StartTime" of the current row. So in your example for row 2 you would add  "06.11.2013 16:18:43" as the "NewStartTime" and for row 5 you would add "06.11.2013 17:39:48" (end times from the previous rows).

This way you'll cut off the time range that has already been covered by the previous row. After this you should be able to just sum the intervals without getting overlapping.

Let me know you this got you on the right track

edit: oh and I'm assuming your data set is ordered by "Identifier" and "StartTime".

-Teemu

Not applicable
Author

Hello Teemu, and thanks for your reply!

This sounds like something that will work for most of the cases. We will give it a try.

However, when I think about it, it will occur that we have a more compound challenge. Take some dummy data again:

IdentifierStartTimeEndTime
1.04.00106.11.2013 10:00:0006.11.2013 10:30:00
1.04.00106.11.2013 10:05:0006.11.2013 10:10:00
1.04.00106.11.2013 10:08:0006.11.2013 10:17:00
1.04.00106.11.2013 10:28:0006.11.2013 10:35:00
1.04.00106.11.2013 10:35:0006.11.2013 10:36:00

In this case, all the rows but the last one have some overlap. This might be an extreme case, but it is possible that such situations could appear in the raw data.

Best regards,

Knut

teempi
Partner - Creator II
Partner - Creator II

Ohh that's right, didn't think of that. You could add a new field for the "EndTime" as well. So let's assume "NewEndTime". You must then create a similar logic for this new field:

If the "EndTime" for the current row is less than the "NewStartTime" for the current row, use "NewStartTime" as the value for "NewEndTime" (this basically means the interval for this row becomes 0). Otherwise use "EndTime" directly. I think you must also change the logic for "NewStartTime" so that it compares the original "StartTime" to the "NewEndTime" instead of "EndTime".

I'm sorry if this is getting a bit confusing, I'm just typing out as I think. But I'm pretty sure that by using new fields for the times you can get the calculations working correctly. Some polishing to my logic might be required though.

-Teemu

Not applicable
Author

Now we are talking!

I've tested this on some real data, and so far it looks promising.

We will test it in production, and see how it works there, and get back to you with the result.

I'm really happy you could help out.

Knut

teempi
Partner - Creator II
Partner - Creator II

Glad I got you on the right track

Not applicable
Author

Hi Teemu,

I just thought I'd tell you that your suggestion worked perfectly!

Thank you for your help!

Best regards,

Knut