Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
kishorj1982
Creator II
Creator II

Execution is taking long time for where condition

Hi All,

PFA.

 

I have attached the screenshot of the log file.

In the script file we have a condition as per below,

==>
WHERE Num ([Original Planned Delivery Date]) < Num (Today ()) AND (Isnull ([Comparison Date]) OR ([Delivered Quantity Punctuality] - [Original Confirmed Quantity]) < 0);

 

We have a huge volume of data and it seems that because of the above condition takes too much long time to reload.

Could you please advise how do I reduce the calculation time for where condition.

Thanks

13 Replies
Gysbert_Wassenaar

It's not necessarily the where clause that's the problem. Unless you specifically tested with and without and seen that it's indeed the where clause that's causing the long processing time of the complete load statement. 


talk is cheap, supply exceeds demand
MindaugasBacius
Partner - Specialist III
Partner - Specialist III

marcus_sommer

I could imagine that rather too less RAM or a heavy workload on your machine are causing this long load-time. Beside this it might be an option to transfer the condition into a flag-field in the resident-table, maybe with:

...
if(([Original Planned Delivery Date] < Today ()) AND (Isnull ([Comparison Date]) OR ([Delivered Quantity Punctuality] - [Original Confirmed Quantity]) < 0), 1, null()) as Flag
...

and then your where-clause could look like:

where exists(Flag);

- Marcus

kishorj1982
Creator II
Creator II
Author

Hi Markus,

 

There are already a number of resident loads.  From the script below is the sample code for this.

Could you please advise how to write the resident load on the top of this.

 

CONCATENATE ([Order Delivery Days])
LOAD *,
Dual (If ([Delivery Difference Days] > 0,
If ([Delivery Difference Days] <= 5, 'Week or Less',
If ([Delivery Difference Days] <= 10, 'Two Weeks or Less',
If ([Delivery Difference Days] <= 20, 'Month or Less',
If ([Delivery Difference Days] <= 40, 'Two Months or Less', 'Over Two Months'))))),
If ([Delivery Difference Days] > 0,
If ([Delivery Difference Days] <= 5, 1,
If ([Delivery Difference Days] <= 10, 2,
If ([Delivery Difference Days] <= 20, 3,
If ([Delivery Difference Days] <= 40, 4, 5)))))) as [Late Status];
LOAD [Sales Document Item Key],
Sum ([Date Type]) as [Delivery Difference Days]
GROUP BY [Sales Document Item Key];
LOAD [Sales Document Item Key],
ApplyMap ('FactoryCalendar_map', [Original Factory Calendar] & '-' & Num ([Original Planned Delivery Date] + Iterno ()), 0) as [Date Type]
WHILE Num ([Original Planned Delivery Date] + Iterno ()) <= Num (Today ());
LOAD [Sales Document Item Key],
[Original Factory Calendar],
[Original Planned Delivery Date]
RESIDENT [Order Delivery]
WHERE Num ([Original Planned Delivery Date]) < Num (Today ()) AND (Isnull ([Comparison Date]) OR ([Delivered Quantity Punctuality] - [Original Confirmed Quantity]) < 0);

 

Thanks

marcus_sommer

Beside a possible overload on your machine there are multiple things which might increase the performance. One was already mentioned by transferring the check of the where-clause into a flag-field of the source-table. The next is the aggregation with group by within the preceeding-chain. I assume it will be rather a disadvantage from a performance point of view and I suggest to test if a split of the chain into 3 loadings would be faster.

Another point is your if-loop which could be optimized if the result of each check would be a dual value, like:

if(condition, dual(string, number), else ...

A small benefit might come by replacing the if-loop with a pick(match()) approach. More should be possible if you could use class() with continuing bucket.

- Marcus

kishorj1982
Creator II
Creator II
Author

Hi Markus, As suggested by you I have splitted my transformed load to number of separate calculations using resident load. It has been observed that below code is the main culprit behind the large reload time. Order_Delivery_Temp2: LOAD [Sales Document Item Key], ApplyMap ('FactoryCalendar_map', [Original Factory Calendar] & '-' & Num ([Original Planned Delivery Date] + Iterno ()), 0) as [Date Type] resident Order_Delivery_Temp1 WHILE Num ([Original Planned Delivery Date] + Iterno ()) <= Num (Today ()); As I am unable to get how to get rid of this. Could you please share your thoughts on this. Thanks.
marcus_sommer

While-loops are usually quite fast whereby if the number of iterations is rather big it will increase the number of records appropriate. I don't know neither your data nor your requirements but I could imagine that there are also other ways to generate the [Late Status] which might be more efficient.

Beside this the following should be (a bit) faster:

LOAD [Sales Document Item Key], ApplyMap ('FactoryCalendar_map', [Original Factory Calendar] & '-' & N, 0) as [Date Type];
LOAD [Sales Document Item Key], [Original Factory Calendar], [Original Planned Delivery Date] + Iterno() -1 as N
WHILE [Original Planned Delivery Date] + Iterno() -1 <= Today();

Further I'm not sure that you need the mapping and/or it maybe significantely simplyfied so that no loop is needed. Maybe something like this is also enough: Today() - [Original Factory Calendar]

Furthermore you may consider to apply an incremental approach.

- Marcus

kishorj1982
Creator II
Creator II
Author

Hi Markus,

The purpose of the script is to calculate the working days' difference between the planned delivery date and the current date based on the factory calendar. Simple subtraction will not work in this case as it will include all days including holidays. Using networkdays() function does not work this case either because while we can exclude Mon-Friday holidays using it, it will also always exclude Saturdays and Sundays from the calculation, which is not what we want as some Saturdays or Sundays might be working days for particular factory calendar. The mapping is needed for different KC factory calendars, which might have different holidays and scheduled working weekends. Basically, we need to have a solution that will consider different factory calendars for each row and will calculate the date difference based on that factory calendar. I would welcome a better solution as the while + mapping loop is quite heavy, but it should be able to do the same functionality as the existing logic.

 

thanks

marcus_sommer

I think I would try a different approach and do all this work within the calendars and then just picking the wanted value.

This means to create a continuous workday-counter within the calendar. Each workday - defined by the weekdays and which aren't defined holidays - get a 1 and the others 0 (beside applying them within a listbox or as a set analysis condition is such flag-field quite useful to sum the values over periods).

The next step would be to cummulate them with interrecord-functions like peek or previous. With it you could pick there the cummulated value from today and then subtracting from it the cummulated value from your delivery date.

But I think you could go even a step further and adding in the calendar another field which calculates already there the differenz between each date and today - and to this result could be already the [Late Status] bucket mapped.

After all you don't need the whole approach of the while-looping, aggregating and clustering else just picking the right value with a mapping.
If there are multiple different factories and appropriate calendars you may need to repeat it a few times and adding them together but it shouldn't be too difficult and mean not more efforts as you already have with your calendar-creation.

- Marcus