Advanced performance tuning

Report Inappropriate Content · ‎2011-06-22

Here is thedeal.

This is the statement which I need to tune:

TempDates:
     LOAD
           'D' as AggrLevel,
           *,
           if(dmDateHour = MonthStart(dmDateHour), ES) as MonthES, \\ calc ES for Monthlyaggregation

           if(dmDateHour = WeekStart(dmDateHour), ES) as WeekES, \\ calc ES for Weeklyaggregation

           MonthStart(dmDateHour) as MonthStartDay,
           WeekStart(dmDateHour) as WeekStartDay
     ;
     LOAD
           DPId,
           TZId,
           StationId,
           DayDateHour as dmDateHour,
           AggrCountry,
           sum(TLH) as TLH,
           sum(SS) as SS,
           sum(if(DPESDateHour = dmDateHour, ES)) as ES \\ calc ESfor Daily aggregation
     Group by
           DPId,
           TZId,
           StationId,
           DayDateHour,
           AggrCountry
     ;
     LOAD
           DPId,
           TZId,
           StationId,
           dmDateHour,
           AggrCountry,
           DayDateHour,
           sum(TLH) as TLH,
           sum(SS) as SS,
           sum(ES) as ES,
           Min(dmDateHour) as DPESDateHour \\first DateHour for a day part
     Group by
           DPId,
           TZId,
           StationId,
           dmDateHour,
           AggrCountry,
           DayDateHour
     ;
     LOAD
           DPId,
           TZId,
           StationId,
           dmDateHour,
           AggrCountry,
           DayDateHour,
           TLH,
           SS,
           ES
     Resident Temp
     Where DayDateHour >= $(vDateStart) \\dailyloop condition
      and DayDateHour < $(vDateEnd);

Temp table:

The table has aggregated information about sessions on different stations for predefined day parts.

Field	Description
DPId	30 possible day parts (e.g. “Whole Week/Whole Day”, “Monday-Friday/10-13”, “Tuesday-Sunday / 9 - 20” and etc.)
TZId	5 possible time zones (e.g. GMT, EST and etc.)
StationId	About 16 500 Stations’ Ids
dmDateHour	A timestamp rounded to each hour sharp (e.g. 1/1/2011 10:00:00) or in QV syntax = floor(timestamp * 24)/24
AggrCountry	2 possible countries – Main and Others
DayDateHour	A beginning of a day calculated derived from dmDateHour (= floor(dmDateHour))
TLH	Total hours for a sessions during dmDateHour (> 0)
SS	# of Started Sessions during dmDateHour
ES	# of Entered Sessions from previous hour into current dmDateHour

The table has 1 month data. As you can see it may have 30 * 5 * 16500 * (30 * 24) * 2

= 3564 000 000 records. Usually it’s not that huge because not all stations have

an activity during a month. It’s about 30-40 millions of records.

Main problem is related with ES. I need to derive number of Active Sessions (AS)which

is Sum(SS) + Sum(ES of first dmDateHour in a day part). And I need to aggregate it to Day.

That’s why I have these multi level aggregations.

I’ve made this condition in a begging

Where DayDateHour>= $(vDateStart)
andDayDateHour < $(vDateEnd);

to reduce a number of records for the aggregation. I have a loop where I derive vDateStart

and vDateEnd for each day of a month. If I remove this condition it will kill whole memory

on a server (98GB).

So, daily aggregation takes 45 mins per each day (about 22 hours for whole month).

If I have just one day data in the Temp table it takes about 3 mins to do sameaggregation

for a day. But most interesting thing is all the rest days of a monthin this daily loop

will calculates during 90 seconds to retrieve NO DATA fromthe Temp table. It means that

a day aggregation takes about 90 second and other90 seconds for data retrieval for

the condition.

My assumption is – the more data I have in the table the longer this data retrievalis.

(It’s too pity that we don’t have indexes in QV).

Is there other way to make this retrieval faster? Or maybe there is another way

for my statement.

Please help.

P.S. The Word version is in attachment.

danielrozental · ‎2011-06-22

Nick, is really hard to give a solution without being able to test things with the real volume and really understand why you're doing each thing.

I think it will be best to work with a consultant on site.

A few ideas that might help

1) Instead of doing something like this

if(dmDateHour = MonthStart(dmDateHour), ES) as MonthES, \\calc ES for Monthly aggregation
if(dmDateHour = WeekStart(dmDateHour), ES) as WeekES, \\calc ES for Weekly aggregation
MonthStart(dmDateHour) as MonthStartDay,
WeekStart(dmDateHour) as WeekStartDay

Perhaps you can have a table, built just from script using autogenerate, that only has records por monthstart and weekstart dates

So you can do something like

LEFT JOIN(TempDates)
LOAD
          dmDateHour,
          MonthStart(dmDateHour) as MonthStartDay,
          WeekStart(dmDateHour) as WeekStartDay,
          1 as FLG_MonthES,
          1 as FLG_WeekES
RESIDENT CALENDAR;

And then use the FLGs in the calculations instead of creating new fields, something like SUM(ES*FLG_MonthES).

2) Instead of doing this

       Where DayDateHour >=$(vDateStart) \\daily loop condition
        and DayDateHour < $(vDateEnd);

You can again, create a table with the dates you want to obtain, i.e all dates in the week, or all dates in the month, and do something like this

LOAD
          Date AS DayDateHour 
RESIDENT CALENDAR
Where Date >=$(vDateStart)
   and Date < $(vDateEnd);
LOAD
    DPId,
    TZId,
    StationId,
    dmDateHour,
    AggrCountry,
    DayDateHour,
    TLH,
    SS,
    ES
FROM File (qvd)
WHERE Exists(DayDateHour);

This will cause the load to be optimized and will also filter records with the desired dates.

Report Inappropriate Content · ‎2011-06-23

Hi Daniel,

Thank you for your help.

1) It can work if the deriving data is based on just one dimension. But in my case I have 6 of them. And the flags can be different for the same dmDateHour.

2) I've tried Exists:

LOAD

DPId,

TZId,

StationId,

dmDateHour,

AggrCountry,

DayDateHour,

TLH,

SS,

ES

Resident Temp

Where Exists(DayDateHour, $(vDateStart));

The result is the same slow.

danielrozental · ‎2011-06-23

Exists will be faster than where when done loading a QVD file and with a single parameter Exists(DayDateHour).

Report Inappropriate Content · ‎2011-06-23

It's good to know. But I don't have both of these cases. It doesn't make sense to store temporary data in QVD. And single parameter Exists is not possible because all possible values for the field already exist.

Thank you

danielrozental · ‎2011-06-23

They were just examples of things you can try to improve performance, you will be better of working with an experience qlikview consultant on site.

Report Inappropriate Content · ‎2011-06-24

You should comment out the top portion and load just the first load statement and record the time. Then uncomment each preceeding load statement one at a time, and perform a reload to capture the time. Use the time differences to figure out where you really start to loose performance. I would also monitor PC and RAM and write down some notes each step of the way.

Also, as stated above its hard to really know the impact of a suggestion without using your hardware and data, but I would try the following:

1) Seperate Day, Date and Hour into different fields this will make the possible values less distinct and possibly decrease your memory footprint. Probably only a slight difference but it may add up for you.

2) All of your date functions result in Dual value outputs, try putting a Num() around them so only the numeric value is passed to the next load. Not sure if the overhead of the Num() will outweight the removal of the Daul value. Only testing will tell.

3) You may want to try using numbers as your IDs not codes. 1, 2, ... may perform better than EST, GMT, ... again I would test.

4) Try saving the Temp resident table to a QVD and dropping it, and then loading it back into memory with the where statement, this may sound crazy but this will cause QlikView to compress the data and free up the memory. Loading a large resident into another resident is ofter a bad idea because until you drop that Temp table (that won't be needed in the end) your using twice the memory.

I don't promise any of these suggestions will work, but I would try each one and record the results. Please share the results I would be interested to know whether each suggestion has a negative or positive impact. There are also other things to consider like what are your hardware and software specs. I have seen machines where the QlikView load process slows down if the CPU affinity is too high (this only started occuring after the multi-threaded loads were introduced).

datanibbler · ‎2013-04-26

Hi,

is there a document or a Blog around for general performance tuning tips and tricks, everything one can do to make different parts of a QlikView document run faster?

Thanks a lot!

Best regards,

DataNibbler

P.S.: I've bookmarked one thread where someone put a collection of some things. Still it seems like a good idea to put all that into a document...

Related Topics