Qlik Community

QlikView Scripting

Discussion Board for collaboration on QlikView Scripting.

Announcements
QlikView Fans! We’d love to hear from you.
Share your QlikView feedback with the product team… Click here to participate in our 5-minute survey.
Rules, plus terms and conditions, can be found here.
Not applicable

Advanced performance tuning

Here is thedeal.

This is the statement which I need to tune:

TempDates:
    
LOAD
           'D'
as AggrLevel,
           *,
          
if(dmDateHour = MonthStart(dmDateHour), ES) as MonthES, \\ calc ES for Monthlyaggregation

          
if(dmDateHour = WeekStart(dmDateHour), ES) as WeekES, \\ calc ES for Weeklyaggregation

          
MonthStart(dmDateHour) as MonthStartDay,
          
WeekStart(dmDateHour) as WeekStartDay
     ;
    
LOAD
          
DPId,
          
TZId,
          
StationId,
          
DayDateHour as dmDateHour,
          
AggrCountry,
          
sum(TLH) as TLH,
          
sum(SS) as SS,
          
sum(if(DPESDateHour = dmDateHour, ES)) as ES \\ calc ESfor Daily aggregation
    
Group by
          
DPId,
          
TZId,
          
StationId,
          
DayDateHour,
          
AggrCountry
     ;
    
LOAD
          
DPId,
          
TZId,
          
StationId,
          
dmDateHour,
          
AggrCountry,
          
DayDateHour,
          
sum(TLH) as TLH,
          
sum(SS) as SS,
          
sum(ES) as ES,
          
Min(dmDateHour) as DPESDateHour \\first DateHour for a day part
    
Group by
          
DPId,
          
TZId,
          
StationId,
          
dmDateHour,
          
AggrCountry,
          
DayDateHour
     ;
    
LOAD
          
DPId,
          
TZId,
          
StationId,
          
dmDateHour,
          
AggrCountry,
          
DayDateHour,
          
TLH,
          
SS,
          
ES
    
Resident Temp
    
Where DayDateHour >= $(vDateStart) \\dailyloop condition
     
and DayDateHour < $(vDateEnd);

Temp table:

The table has aggregated information about sessions on different stations for predefined day parts.

 

Field

 
 

Description

 
 

DPId

 
 

30 possible day  parts (e.g. “Whole Week/Whole Day”,

“Monday-Friday/10-13”, “Tuesday-Sunday / 9  - 20”

and etc.)

 
 

TZId

 
 

5 possible time  zones (e.g. GMT, EST and etc.)

 
 

StationId

 
 

About 16 500  Stations’ Ids

 
 

dmDateHour

 
 

A timestamp rounded  to each hour sharp (e.g.

1/1/2011 10:00:00) or in QV syntax

= floor(timestamp  * 24)/24

 
 

AggrCountry

 
 

2 possible countries  – Main and Others

 
 

DayDateHour

 
 

A beginning of a day  calculated derived

from dmDateHour (= floor(dmDateHour))

 
 

TLH

 
 

Total hours for a  sessions during

dmDateHour (> 0)

 
 

SS

 
 

# of Started Sessions  during dmDateHour

 
 

ES

 
 

# of Entered  Sessions from previous hour

into current dmDateHour

 

The table has 1 month data. As you can see it may have 30 * 5 * 16500 * (30 * 24) * 2

= 3564 000 000 records. Usually it’s not that huge because not all stations have

an activity during a month. It’s about 30-40 millions of records.

Main problem is related with ES. I need to derive number of Active Sessions (AS)which

is Sum(SS) + Sum(ES of first dmDateHour in a day part). And I need to aggregate it to Day.

That’s why I have these multi level aggregations.

I’ve made this condition in a begging

Where DayDateHour>= $(vDateStart)
     
andDayDateHour < $(vDateEnd);

to reduce a number of records for the aggregation. I have a loop where I derive vDateStart

and vDateEnd for each day of a month. If I remove this condition it will kill whole memory

on a server (98GB).

So, daily aggregation takes 45 mins per each day (about 22 hours for whole month).

If I have just one day data in the Temp table it takes about 3 mins to do sameaggregation

for a day. But most interesting thing is all the rest days of a monthin this daily loop

will calculates during 90 seconds to retrieve NO DATA fromthe Temp table. It means that

a day aggregation takes about 90 second and other90 seconds for data retrieval for

the condition.

My assumption is – the more data I have in the table the longer this data retrievalis.

(It’s too pity that we don’t have indexes in QV).

Is there other way to make this retrieval faster? Or maybe there is another way

for my statement.

Please help.

P.S. The Word version is in attachment.

Tags (1)
7 Replies
danielrozental
Honored Contributor II

Re: Advanced performance tuning

Nick, is really hard to give a solution without being able to test things with the real volume and really understand why you're doing each thing.

I think it will be best to work with a consultant on site.

A few ideas that might help

1) Instead of doing something like this

                   

if(dmDateHour = MonthStart(dmDateHour), ES) as MonthES, \\calc ES for Monthly aggregation

if(dmDateHour = WeekStart(dmDateHour), ES) as WeekES, \\calc ES for Weekly aggregation

MonthStart(dmDateHour) as MonthStartDay,

WeekStart(dmDateHour) as WeekStartDay

Perhaps you can have a table, built just from script using autogenerate, that only has records por monthstart and weekstart dates

So you can do something like

LEFT JOIN(TempDates)

LOAD

          dmDateHour,

          MonthStart(dmDateHour) as MonthStartDay,

          WeekStart(dmDateHour) as WeekStartDay,

          1 as FLG_MonthES,

          1 as FLG_WeekES

RESIDENT CALENDAR;

And then use the FLGs in the calculations instead of creating new fields, something like SUM(ES*FLG_MonthES).

2) Instead of doing this

       Where DayDateHour >=$(vDateStart) \\daily loop condition
       
and DayDateHour < $(vDateEnd);

You can again, create a table with the dates you want to obtain, i.e all dates in the week, or all dates in the month, and do something like this

LOAD

          Date AS DayDateHour

RESIDENT CALENDAR

Where Date >=$(vDateStart)

   and Date < $(vDateEnd);

LOAD

    DPId,

    TZId,

    StationId,

    dmDateHour,

    AggrCountry,

    DayDateHour,

    TLH,

    SS,

    ES

FROM File (qvd)

WHERE Exists(DayDateHour);

This will cause the load to be optimized and will also filter records with the desired dates.

Not applicable

Re: Advanced performance tuning

Hi Daniel,

Thank you for your help.

1) It can work if the deriving data is based on just one dimension. But in my case I have 6 of them. And the flags can be different for the same dmDateHour.

2) I've tried Exists:

LOAD

        DPId,

        TZId,

        StationId,

        dmDateHour,

        AggrCountry,

        DayDateHour,

        TLH,

        SS,

        ES

    Resident Temp

    Where Exists(DayDateHour, $(vDateStart));

The result is the same slow.

danielrozental
Honored Contributor II

Re: Advanced performance tuning

Exists will be faster than where when done loading a QVD file and with a single parameter Exists(DayDateHour).

Not applicable

Re: Advanced performance tuning

It's good to know. But I don't have both of these cases. It doesn't make sense to store temporary data in QVD. And single parameter Exists is not possible because all possible values for the field already exist.

Thank you

danielrozental
Honored Contributor II

Re: Advanced performance tuning

They were just examples of things you can try to improve performance, you will be better of working with an experience qlikview consultant on site.

Not applicable

Re: Advanced performance tuning

You should comment out the top portion and load just the first load statement and record the time.  Then uncomment each preceeding load statement one at a time, and perform a reload to capture the time.  Use the time differences to figure out where you really start to loose performance.  I would also monitor PC and RAM and write down some notes each step of the way.

Also, as stated above its hard to really know the impact of a suggestion without using your hardware and data, but I would try the following:

1) Seperate Day, Date and Hour into different fields this will make the possible values less distinct and possibly decrease your memory footprint.  Probably only a slight difference but it may add up for you. 

2) All of your date functions result in Dual value outputs, try putting a Num() around them so only the numeric value is passed to the next load.  Not sure if the overhead of the Num() will outweight the removal of the Daul value.  Only testing will tell.

3) You may want to try using numbers as your IDs not codes.  1, 2, ... may perform better than EST, GMT, ... again I would test.

4) Try saving the Temp resident table to a QVD and dropping it, and then loading it back into memory with the where statement, this may sound crazy but this will cause QlikView to compress the data and free up the memory.  Loading a large resident into another resident is ofter a bad idea because until you drop that Temp table (that won't be needed in the end) your using twice the memory.

I don't promise any of these suggestions will work, but I would try each one and record the results.  Please share the results I would be interested to know whether each suggestion has a negative or positive impact.  There are also other things to consider like what are your hardware and software specs.  I have seen machines where the QlikView load process slows down if the CPU affinity is too high (this only started occuring after the multi-threaded loads were introduced).

datanibbler
Esteemed Contributor

Re: Advanced performance tuning

Hi,

is there a document or a Blog around for general performance tuning tips and tricks, everything one can do to make different parts of a QlikView document run faster?

Thanks a lot!

Best regards,

DataNibbler

P.S.: I've bookmarked one thread where someone put a collection of some things. Still it seems like a good idea to put all that into a document...

Community Browser