Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I am working with a big table (200M rows), and having a response time issue
I'm using the table in two phases:
phase 1: loading up the table, renaming, and filtering by date (this takes about 5 minutes)
FactClicks_temp:
LOAD JobId as JobId,
date(daystart(DATE)) as Date,
CampaignId as CampaignID,
PageType
FROM [1_raw\FactClicks.qvd] (qvd)
WHERE DATE>='2016-01-01';
STORE FactClicks_temp into [2_dw\FactClicks_temp.qvd] (qvd);
phase 2: group by, and using Sum If to create new columns (this takes about 20 minutes)
FactClicks:
LOAD JobId,
Date,
CampaignID,
Cost,
ClickSourceDesc,
IsMobile,
SUM(IF(PageType...) )
FROM [2_dw\FactClicks_temp.qvd] (qvd)
GROUP BY JobId, Date, CampaignID
Why phase 2 takes so much time? is it becuase i'm working on a non optimized load?
can I do something?
Thanks,
Nir
its because of using GROUP BY statement in the second phase ...as the number of columns increases in the group by statement it will kill the time ....try to push the grouping to SQL and then import the data to Qlikview, this will help you to improve the performance
Neither of these loads will be optimised. The second one takes longer because of the group by.
its because of using GROUP BY statement in the second phase ...as the number of columns increases in the group by statement it will kill the time ....try to push the grouping to SQL and then import the data to Qlikview, this will help you to improve the performance
I have learnt it a hard way, but aggregating the data by using Group By statement slows down the reload speed.
Hi,
You are doing calculations Sum() and Group By(), so it will take time based on the data volume. Instead of this why can't you directly do the Sum() in front end (Charts).
Regards,
Jagan.
thank you,
Isnt' it better to the id on the SQL load?
With 200M rows I would look at implementing an incremental load so you are only processing modified rows and do not need to group and sum on all 200M rows every time you reload the data.
Whether it is better to process data on your SQL server or Qlik Server depends on which server has the available capacity for this processing. Both can be equally valid options.
Hi,
load raw qvd first
then do calculations by taking resident load
and do phase 1 save qvd
next take
do the sum()
group by using resident load of the raw table.
this might help you to fasten the reload time.
Regards
$@m.
This is a great & helpful suggestion!
I still dont understand why the resident load in Qlikview is slow,
why is the MSSQL server is so much better in this kind og aggregation?