Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
paolo_cordini
Partner - Creator
Partner - Creator

QVD load slower than DB Query

Hello

I am facing the following situation:

CASE 1:
Load
.....
;
SQL SELECT
....
FROM table;

Basically, a query from a database table and a preceding load with many calculation, applymaps...

approx 100 fields, 20M rows

CASE 2:

Load
.....
FROM table.qvd;

Here the QVD table has been previously stored and it is identical to the database query output.

The Load... FROM is identical to CASE1. Only difference is the "FROM .... QVD".

Same number of fields and rows.

Now, what I see is that CASE2 takes always more time than CASE1. In other words, reading from QVD is slower than database reading. Case 1 takes approx 2 hours, Case 2 takes about 15 minutes more.

I cannot figure out an explanation to this behaviour. QVD is stored locally, so there are no communication issues.

I tried both by splitting into several smaller QVDs and by splitting the calculation from the unchanged fields, joining them after again, with no speed improvement.
Unfortunately, I cannot perform the calculation in an incremental script, because applymaps must use the most recent data for all records.

Any ideas or hints?

Thank, Regards

Paolo

6 Replies
marcus_sommer

By an optimized qvd-load of this size I would expect a run-time of about one minute maybe two by loading from a local storage of a modern hd-raid or ssd. Therefore something seems quite wrong.

This might be that your storage isn't locally or there isn't enough RAM within your machine or you didn't load the file as with the filetype of qvd - means: load * from file.qvd (qvd);

Beside this how did you measure the load-time for this particular load, within the logfile or with any other approach?

- Marcus

paolo_cordini
Partner - Creator
Partner - Creator
Author

Thank you Marcus

in both cases qvd-load is NOT optimized. I have many calculations and applymaps. I know that in this case the load time is way higher, but I would have expected it to be somewhat lower than a DBMS Query+preceding Load, maybe I am wrong.

The qvd is stored locally and I loaded it witn FROM file.qvd (QVD).

I measured the load time in the log file. Anyway, the overall elapsed time of my script is always higher when reading from QVD, and the rest of the script is identical.

Thank You Again

Paolo

marcus_sommer

If you apply the same transformations on the sql- and the qvd-load and the load-times aren't in a significantly favour for the qvd it means that not the data-transfer is the biggest bottleneck else the available resources of CPU and/or RAM. If the CPU processed the data slower as the network/storage could deliver them the source isn't much important anymore.

I suggest to check your transformations again if they couldn't be optimized in some ways, especially in regard to splitting the task if they contain aggregations, where-clauses, order by, interrecord-functions in separate loads. Further not only slicing the data into horizontal chunks else in vertically ones. Of course these measures will cause some additionally efforts to merge all the data again - but the overall run-time might still decrease.

It's just an assumption to the main-cause of the slow qvd-loading - if you could separate those load-parts which forces a single-threaded execution from those which would run multi-threaded you may speed up your task.

- Marcus

JustinDallas
Specialist III
Specialist III

I would try doing a raw data load i.e no applymaps, no IF( foo = blah) statements, no aggregates.  If that load is acceptable, then I would assume that I've got a menacing transformation.  You could use a binary search method of adding your transformations till you find the one(s) that are breaking the time bank.

If that's not acceptable, then I would assume that my disk IO was bad which is another issue.

paolo_cordini
Partner - Creator
Partner - Creator
Author

The raw data load (Optimized qvd load) is much faster, but with transformation the load time is more or less linear with the number of transformations, so I don't have a single bad transformation.

Anyway, this would not explain why, the SAME transformation is slower when the source is a QVD with respect to a table file.

I don't have Group By or Interrecord functions.

This is an example of some worst-case transformation. 

if(alt(RIgaValorSw,0)=0
AND Cli<> 0
AND
(
(MATCH(MovTipo,'Ven','TrasfPar','VenDaFat') AND NOT(FatCo=0 AND FatKl=0 AND FatKn=0 AND FatPz=0) AND FatImp = 0 and MovArtTipo='M')
OR
((MATCH(MovTipo,'TrasfArr') OR MovMag='CA') AND FatImp = 0 and MovArtTipo='M')
)
AND (FatCo<>0 OR FatKn<>0 OR FatPz<>0)
,1,0
) as FlgMovNonVal,

if(
(applymap('MapArtTipo',Az&'|'&Mag&'|'&Art&'|'&ForArt,MovArtTipo )='M' AND MovMag<>'TP' AND MovMag<>'TA')
AND ((FatCo <> CoArrAss_1+CoArrAss_2+CoArrAss_3+CoArrAss_4+CoArrAss_5 )
OR (Arr_1+Arr_2+Arr_3+Arr_3+Arr_4+Arr_5=0)
),1,0
) as FlgNonAttribuiti,

marcus_sommer

With this kind of transformation the processing must be quite slow. In addition to my sugestion above I recommend to rethink the whole flag-creating approach.

I could imagine that these flags could be created differently by slicing the task horizontally and vertically and also applying incremental approaches (only those parts which really needs to run over all data should this do). Like mentioned this will cause some overhead but the overall times may be significantly lesser.

Even some additionally flags may be useful, for example to avoid the if-loops with the multiple AND and OR and just to use something like:

flag1 * flag2 * flag3 as X,
rangesum(flag4, flag5) as Y,
rangemin/max/avg(flag7, flag8) as Z

I assume there is much potential to speed up the task.

The amount of transformations and their complexity is probably also the cause why the qvd-load is slower as from the database. The Qlik logic of storing the distinct fieldvalues within system tables and using a bit-stuffed pointer to the data table is in the most cases very effective because it reduced the overall data-size and the "usual" few and simple transformations (if no optimized load is possible) is further significantly faster as loading from any database or Excel. But if it comes to a lot of transformations the computing to pull multiple fields in a record-level to make all the comparisation/calculations could be quite heavy and may take longer as if there ar all values already in a record, like a database would store them.

- Marcus