Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a table in my DB with approximately 220 millions of records and rising every day.
In my data script, I decided to store the records in on QVD for each month but reading the data from the QVDs is taking more time than I had expected (around 15 min) and it's consuming a lot of resources in my server.
Would it be better if I had one QVD per year? Is it faster for Qlik to read a single QVD than several of them?
Thanks,
Juan
When you have typically more than a million records it is always important to look at how you can help Qlik to get the most out of it's built-in compression mechanisms.
"link_timestamp",
"link_date",
"link_time",
"link_time_15min",
"link_minute",
"link_hour",
[link.id],
speed,
"speed_mph",
TravelTimeIndex,
traveltime
Splitting a high-cardinality field like a timestamp into it's constituent parts like date, hour, minute, second and tenths of seconds will help tremendously - but don't keep the original timestamp. This will enable much much higher compression under the hood for Qlik. You might save a lot of memory.
Hello Juan,
In General it doesn't matter in Performance which way you go (monthly or yearly QVD's)
Both ways Need the same RAM and CPU if you load all data.
I think monthly QVD's are a good way. This way you can shrink down the data model if your Client only Needs the last 6 months.
I think what you Need to check here is if you Load is an optimized load?
if not you can save a lot of time.
Turning Unoptimized Loads into Optimized Loads
regards
tim
For the monthly QVDs I'm using a temp table where I put the records within the month, then I store it in a qvd, drop this temp table and load (concatenate) the qvd to my main table.
Looks like I'm doing a lots of steps. Do you think this could be done in a simpler way? I'm using the temp table to avoid storing records of previous months in the monthly QVDs.
1: you guys use incremental load ?
2: any aggregation function used in the script with group by ?
check this out this may help
https://community.qlik.com/docs/DOC-19450
It sounds that your temp-table approach isn't necessary and could be removed by loading the data with a where exists-clause to keep the loadings optimized. Take a closer look on the link provided by Tim.
- Marcus
Hello all,
Thanks a lot for your replies. I've had a look on the DOC but I don't see how it applies to my case. I don't want to overwhelm you with code, but I'd like to share a piece of it where I'm doing the following:
1. If the QVD for the month exists, I concatenate its data in the main table (traffic)
2. If the QVD does not exist yet, I put the data in a temporary table (temp_traffic)
2.1. STORE temp_traffic into new QVD
2.2 DROP table temp_traffic
2.3 Concatenate this new QVD to the main table (traffic)
It takes a lot of time, that's why it seems to me I must be doing something wrong.
Any idea is welcome! Thanks!
--
The code:
// If the monthly qvd exists, I concatenate its data to the main table
IF not isnull (qvdcreatetime(_vFileName)) THEN
[traffic]:
CONCATENATE LOAD "link_timestamp",
"link_date",
"link_time",
"link_time_15min",
"link_minute",
"link_hour",
[link.id],
speed,
"speed_mph",
TravelTimeIndex,
traveltime
FROM $(_vFileName) (qvd);
ELSE
[temp_traffic]:
CONCATENATE LOAD "start_date" as "link_timestamp",
Date(Floor("start_date")) AS "link_date",
Time("start_date",'hh:mm') AS "link_time",
Time(Round(Frac(Time("start_date",'hh:mm')),1/96),'hh:mm') as "link_time_15min",
Minute("start_date") as "link_minute",
Hour("start_date") as "link_hour",
"item_id" as [link.id],
speed,
"speed_mph",
Round(freeflow_speed / speed_mph, 0.1) as TravelTimeIndex,
traveltime;
SQL SELECT "start_date",
"item_id",
speed,
"speed_mph",
traveltime,
link.freeflow_speed
FROM pems."public"."measure_link_15min" tc
LEFT JOIN link on tc.item_id = link.link_id
WHERE (start_date > '$(vStartDate)' AND start_date <= '$(vEndDate)') AND item_id=11001069 ;
STORE temp_traffic into $(_vFileName) (qvd);
DROP Table temp_traffic;
[traffic]:
CONCATENATE LOAD "link_timestamp",
"link_date",
"link_time",
"link_time_15min",
"link_minute",
"link_hour",
[link.id],
speed,
"speed_mph",
TravelTimeIndex,
traveltime
FROM $(_vFileName) (qvd);
ENDIF
In this way you will need a temo-table to be able to store this data-slice. It's not quite easy to say what could be changed/improved but I think I would split the task in at least two layers - the first which reads from the database and creates the monthly qvd-slices and a second one which then reads the qvd's and performed further transformations on it.
- Marcus
It looks like you load all your datas from your data source and resident load them per month.
A resident load is much lower than a SQL select.
So I would load datas für month from data source and store them into qvd.
Also, is it neccessary to load months 3 or 4 periods before again? If they don't be changed (invoices never be changed only a new one is made) you only have to load 2-3 months each night and store into qvd.
Has to be tested.
Regards
Hi Martin,
When you speak of not loading again previous months, do you mean not doing the sql query?
I always check if the monthly qvd already exists before doing the sql query (I don't do it if there's a qvd). But still, reading qvds takes some time.
Thanks
yes I meant (for example today) I would only query Jan 2018 and Dec 2017, older months not (but they are still available for data model)