Solved: Re: IntervalMatch() taking too long - Qlik Community

valpassos · ‎2021-03-19

Hi community,

I believe I have a classic IntervalMatch use scenario, but the use of this prefix is taking up too much time to compute - I can't afford that time, so maybe you can advise me some other way?

I have a table with the periods for the existing Interest Rates.

Intervals:
StartDate,
EndDate,
Rate

In my fact table I have the date, Fact_Date, which I need to know if it exists between these intervals, so I can extract the corresponding Rate for that month and then do the calculations I need.

I'm doing this:

BridgeTable:
IntervalMatch(Fact_Date)
LOAD DISTINCT
StartDate
EndDate
RESIDENT Intervals;

I afforded to wait 30 minutes but then aborted, because without this IntervalMatch my model takes up to 1 minute to load.

Is there any other way to do this?

Thanks!
Regards

marcus_sommer · ‎2021-03-22

Even if you remain by these data it are just 100 years which means around 36,500 records. Such a loop shouldn't need more as one minute rather less. Therefore I assume that's something wrong with the data itself. Are this really dates means numeric values and not strings? Further take a check that there only 30 records within the source? There may n duplicates.

Beside this I suggest to consider to adjust the last date of 31/12/2099 to something like today() or maybe yearend(today()).

- Marcus

View solution in original post

marcus_sommer · ‎2021-03-19

Usually I wouldn't use intervalmatch else I would tend to such an approach:

m: mapping load date(StartDate + iterno() -1), Rate from Intervals while date(StartDate + iterno() -1) <= EndDate;

Facts: load *, applymap('m', Fact_Date, 0) as Rate from Facts;

because it's more simple and in my experience quite fast (whereby I never measured which method is faster but IMO the intervalmatch-logic used some detours and should be therefore slower).

- Marcus

valpassos · ‎2021-03-22

Hi Marcus,

Thanks! I'm trying that approach right now. It it still running - 22 minutes now... Don't you have any idea about how much faster (or less slower...) this approach will be compared to the IntervalMatch() ?)

Will I really have to wait this long, regardless of the approach?

Lisa

marcus_sommer · ‎2021-03-22

Please elaborate it a bit more detailed what you are doing - means how many records are within the tables Interval (before and after the while-loop) and Facts and how are they loaded.

If the difference between StartDate and EndDate is quite large maybe around 8 years and there are around 1M of products or any similar ID's it means the final table would have around 3G of records. This would of course need some time.

If one or both loadings without this intended (interval) transformation are optimized loads then this transformation will break the optimization which will mandatory result in a significantly longer run-time.

This means we need more information to be able to give better advices respectively to extend them - if you are handling rather larger datasets you will quite probably need some incremental approaches to apply these transformation not always against the entire dataset else just against the new/changed records.

- Marcus

valpassos · ‎2021-03-22

Hi Marcus,

Yes, I will give you more information:

The script is still running, 2 hours now. It is stuck in the Mapping Load only, so it's not yet performing the ApplyMap() (which means it hasn't even reached my Fact table).

This is my logic:

Intervals:
LOAD
   StartDate,
   EndDate,
    "Rate"
FROM Intervals
(qvd);


MapInterestRateInterval:
MAPPING LOAD
	Date(StartDate + IterNo() -1)  AS StartDate
    ,"Rate"
RESIDENT Intervals
WHILE Date(StartDate + IterNo() -1) <= Date(EndDate); 

DROP TABLE Intervals;

My Intervals file only has 30 records: This Intervals table was already generated by an IntervalMatch().

In my fact table, I need to compare a date against these intervals to extract the corresponding Rate for my fact's month. But, again, the script is stuck in the MAPPING LOAD.

Thank you very much for your support!

Lisa

marcus_sommer · ‎2021-03-22

Even if you remain by these data it are just 100 years which means around 36,500 records. Such a loop shouldn't need more as one minute rather less. Therefore I assume that's something wrong with the data itself. Are this really dates means numeric values and not strings? Further take a check that there only 30 records within the source? There may n duplicates.

Beside this I suggest to consider to adjust the last date of 31/12/2099 to something like today() or maybe yearend(today()).

- Marcus

valpassos · ‎2021-03-23

Hi Marcus,

My dates are really dates (the Date() function ensures that, right?), and the Intervals table is exactly what I posted here: 30 rows. In fact, the Excel table I showed is the table I generated in Qlik for my Intervals table.

I adjusted the date 31/12/2099 to YearEnd(Today()) but still the computation time is taking too long...

As you say, this should be a rather simple calculation. Can't understand what is happening.

Do you have any other hypothesis?

Thanks!

Lisa

marcus_sommer · ‎2021-03-23

Date() is a format-function to format a numeric value with a defined format-pattern. It doesn't ensure that the value is a real date, for example by timestamps which remain timestamps only that the time-part isn't displayed. If the value is instead a string it would return NULL - therefore we could assume that they at least timestamps which wouldn't bother in the used logic because the iteration-value is 1 so that any time part would delay the loop.

That you show the interval table within a table-box and export them to Excel and there just 30 records doesn't mean that there are really 30 records because all objects in Qlik show only distinct values. This means it shows that you have 30 different records but not how many are really there. To see this you could look within the table-viewer or you add rowno() as RowNo as extra field to this table-load and adds the RowNo also within the table-box.

I just created a simple dummy from your screenshot and applied the loop and it takes less than a second - to see any result within the runtime-variable I added a sleep of a second:

let vStart = num(now() - today(), '#.########', '.', ',');
load *, rowno() as RowNo, iterno() as IterNo, date(StartDate + iterno() -1) as Date
while StartDate + iterno() -1 <= EndDate;
load makedate(1998 + floor((recno() - 1) / 2), pick(match(even(recno()), 0, -1), 1, 7)) as StartDate,
makedate(1998 + floor((recno() - 1) / 2), pick(match(even(recno()), 0, -1), 6, 12),
pick(match(even(recno()), 0, -1), 30, 31)) as EndDate,
ceil(rand() * 10) as Rate, recno() as RecNo
autogenerate 48;
sleep 1000;
let vEnde = num(now() - today(), '#.########', '.', ','); let vRuntime = time($(vEnde) - $(vStart));

This means the approach is really very fast - not only by such small dataset else also with bigger ones and AFAIK there is no other method which would be faster.

Therefore something is wrong with your data. Maybe there are much more data and/or the values may strings (never tested how the while-condition would evaluate them) and/or your data-quality is bad and there are blanks or NULL's or similar - anything which leads the loop to never ending ... and one bad record would be enough for it.

- Marcus

valpassos · ‎2021-03-24

Hi Marcus,

You are right, my Intervals table has duplicates. Actually it has millions and millions of records: it was badly transformed in the model (I only had access to the final .QVD).

I forgot to perform some quality check tests. My bad... What a big oversight of my part! This basic quality checks should come in a reference sheet to always have at hand 😝

Thank you for your explanation and tips, it was very helpful 🙂

Regards,

Lisa