Counting Simultaneous Tasks in Charts

JGMDataAnalysis · ‎2023-11-16

Hi! Could you tell me if there is a chart aggregation function that allows me to count simultaneous tasks?
I'm currently generating the "Datetimes" table and then using the Aggr function on the measure.
The problem is that "Datetimes" has more than 40 million records and performance is considerably degraded by using the Aggr function.

Is there a more efficient way to obtain the same result?

@rwunderlich, @marcus_sommer, @sunny_talwar

vincent_ardiet_ · ‎2023-11-22

Ok I see, indeed difficult to get rid of the count in this case...
Have you tried my proposal to avoid the Distinct with the refTasks table? Is it giving better performance or not?

marcus_sommer · ‎2023-11-22

Yes, it makes it more clear and I missed that you don't want to count all tasks else only the ones which runs simultaneously. Further that's intended to show the max. value of them in regard to a certain period-dimension and some selections within them. Especially the last requirement is quite hard to optimize by a pure UI approach because it will always need an aggr() to evaluate the results of the count-aggregation. Any kinds of pre-aggregation seems not to be feasible but I wouldn't discard them too easily as impossible.

Beside this I could imagine that an approach like suggested from @vincent_ardiet_ might be suitable to improve the UI performance by evaluating some essential information already within the data-model by using interrecord-functions. Maybe with something like this:

t: load id, start, end,
if(start < previous(end), 1, 0) as flag,
if(start < previous(end), previous(id)) as id_join
resident source order by start;

m: mapping load id_join, 1 resident t where flag = 1;

x: load id, start, end, if(flag, flag, applymap('m', id, 0)) as flag
resident t;

It's a bit simplified and may need some more if-conditions and/or flag-fields respectively more logic to handle the overlapping of n records and/or challenges in regard to the sorting.

The above approach would be applied within the facts and not in the separate DateTime-table and checked the previous end-date against the current start-date and flagged it as well as providing the previous id which is used within the later mapping to add the flag also this record - because they belong together.

With such a flag as selection or as a set analysis condition the virtual-tables behind the count() and the on-top aggr() could be reduced and the UI performance should become better. Of course it will depend on the rate of how many records could be exclude - by just 2% - 3% it won't be probably noticeable but by > 30% it should be significantly.

Further thinkable would be to calculate the timestamps of the records against each other to get their differences and group them as within 1 or 24 hours or a day/month and various other stuff which might be also usable as restrictions within the calculation or maybe even as base for any pre-aggregation.

JGMDataAnalysis · ‎2023-11-22

Thanks, Marcus.
The flag you propose allows you to identify those "tasks" that do not overlap with any other at any time.
I understand that I should generate a similar flag but in the separate DateTime-table, according to the following:

INNER JOIN (Datetime)
LOAD
    TaskDate,
    TaskTime,
    If(Count(Distinct TaskID) > 1, 1, 0) AS Flag
RESIDENT
    Datetime
GROUPED BY
    TaskDate,
    TaskTime
;

RangeMax(Max({<Flag = {1}>} Aggr(Count({<Flag = {1}>} Distinct TaskID), TaskDate, TaskTime), 1)

For the following example (as I understand it) the flag that you suggested would mark all tasks at 1 and
as a consequence, all their 1 second partitions would remain.

On the other hand, with the flag calculated on the table you could keep filtering only those partitions
that have more than one associated task.

I did a quick test on the real data and was able to notice that the implementation of the flag
in the DateTime-table allows me to exclude approximately 35% of the records, which is reflected
in the UI performance.

I would like to explore the last optimization option you mentioned but I didn't fully understand it.
Can you include a simple example in code?

Further thinkable would be to calculate the timestamps of the records against each other to get their differences and group them as within 1 or 24 hours or a day/month and various other stuff which might be also usable as restrictions within the calculation or maybe even as base for any pre-aggregation.

JGMDataAnalysis · ‎2023-11-22

I did the test but didn't notice any difference, unfortunately.
Thanks, Vincent.

marcus_sommer · ‎2023-11-22

Your method to use a self-joining aggregation-load to add the flag to the DateTime-table is definitely an alternatively. I'm not sure in which table the flag would improve the UI performance more. Personally I would assume it's the shorter fact-table and I would also think that the interrecord-approach may run faster as the aggregation-load. But you could just test it against each other.

By the last I hinted for the possibility not just to create a flag else also to calculate the time-differences between the current start-time and the previous end-time and in a second measure these differences might be accumulated and the next measure might be to group these accumulations into hours/dates/months. In a similar way might the above flag or another one be used to accumulated the number of tasks within a row - means the flag-value mustn't be just 0 or 1 else might be 2, 3, 4, 5, .... and in the set analysis the condition is: flag = {">=1"}

Like above mentioned I'm not sure how to use such offset/grouping and/or further/extended flag-values in a sensible manner to reduce the data-set for the aggr() or maybe further aggregations in the script but there might be pattern which are useful - especially if there are clear rules how to handle hour/date/month overlapping tasks which would it makes easier to reset flags/accumulations in their regard as well as to the risk of counting them twice.

At last if the second-parts are needed to determine the belonging to overlapping tasks and the flag would fulfill this requirement but the lowest granularity to display the counting is the hour you may remove the minutes and seconds from the data-model respectively you don't need to create them (if tries to create the flags within the fact-table and only afterwards creating the DateTime-table).

General Question