Solved: Non Optimized QVD Load vs Resident Load - qlikview - Qlik Community

Report Inappropriate Content · ‎2015-10-22

Hello All

I have a Qlikview project where a Table A QVD contains about 1.5 million rows. i have already loaded this QVD into the memory.

I want to load another table Table B which is a subset of Table A using a where clause.

e.g.

Table B:

Load

AID as BID,

.....

resident load Table A

where Purchase Year = Last Purchase Year

In such a case is an Non Optimized QVD Load (because of where clause) better or a Resident Load?

Any help from your experience appreciated

JonasValleskog · ‎2015-10-22

The Optimized QVD load is faster as long as your I/O bandwidth is good. My machine has a solid state drive, so unlikely a fair test machine to compare with most servers out there today, but never the less, here are my results when running the well written script by maxgro with a few minor improvements.

Notes:

It is possible to achieve an OPTIMISED qvd load even with a filter condition, if the condition can be satisfied by a WHERE EXISTS() clause in where the input field has the same name as the field you are checking against. This can be used to your advantage here and without it, the test is not comparing like-for-like when considering the time lapse for an optimised QVD load. See below.

A tweaked script gives this output:

t << AUTOGENERATE(10000000) 10,000,000 lines fetched

***** read resident 22/10/2015 22:08:39

tr << t 1,665,914 lines fetched

00:00:04

Dim2 << INLF7C 1 lines fetched

***** read qvd optimized 22/10/2015 22:08:44

t << t (qvd optimized) 1,665,914 lines fetched

00:00:01

***** read qvd non optimized 22/10/2015 22:08:45

tq << t 1,665,914 lines fetched

00:00:05

Modified script

t:

Load

TransLineID,

TransID,

mod(TransID,26)+1 as Num,

Pick(Ceil(3*Rand1),'A','B','C') as Dim1,

Pick(Ceil(6*Rand1),'a','b','c','d','e','f') as Dim2,

Pick(Ceil(3*Rand()),'X','Y','Z') as Dim3,

Round(1000*Rand()*Rand()*Rand1) as Expression1,

Round( 10*Rand()*Rand()*Rand1) as Expression2,

Round(Rand()*Rand1,0.00001) as Expression3;

Load

Rand() as Rand1,

IterNo() as TransLineID,

RecNo() as TransID

Autogenerate 10000000

;

store t into t.qvd (qvd);

// resident

let t=now(); trace ***** read resident $(t);

tr: load *, 1 Resident t where Dim2 = 'a';

let t=time(now()-t); trace $(t);

drop table tr,t;

Dim2:

LOAD * INLINE [

Dim2

a

];

// optimized

let t=now(); trace ***** read qvd optimized $(t);

t: load * from t.qvd (qvd)

WHERE EXISTS(Dim2);

let t=time(now()-t); trace $(t);

DROP TABLE t;

// qvd non optimized

let t=now(); trace ***** read qvd non optimized $(t);

tq: load *, 1 from t.qvd (qvd) where Dim2 = 'a';

let t=time(now()-t);; trace $(t);

drop table tq;

View solution in original post

Anonymous · ‎2015-10-22

If you can, I would put a flag on that table saying 1 as _LastYearFlag and use that to denote last year's data. That way you don't have to reload and duplicate data.

If you must, doing a resident load would be better performance wise.

maxgro · ‎2015-10-22

In this case I think I'll use a resident load

but

I would not worry about the diffference between resident and not optimized for a 1,5 million file

Report Inappropriate Content · ‎2015-10-22

the actual QVD is about 1.7 gig in size. 200+ columns

Hence my question.

Anonymous · ‎2015-10-22

It will not matter a much. Rather than loading a qvd again take a resident load. Just to understand why you want to reload the data again. You can play with the loaded data for any year..

Imran K

Anonymous · ‎2015-10-22

Depending on the number of columns in the table... if the table has 100 columns, this could be a different story.

Anonymous · ‎2015-10-22

I'd recommend a flag for current year then. If not, then a resident load would be best.

maxgro · ‎2015-10-22

this is the result of a small test, you find the script at the end

optimized, non optimized, resident, the elapsed is 1 or 2 second

result

t << AUTOGENERATE(1000000) 1.999.146 lines fetched

***** read qvd optimized 22/10/2015 20:51:30

t << T (qvd optimized) 1.999.146 lines fetched

22/10/2015 20:51:31

***** read resident 22/10/2015 20:51:31

tr << t 333.454 lines fetched

22/10/2015 20:51:32

***** read qvd non optimized 22/10/2015 20:51:32

tq << T 333.454 lines fetched

22/10/2015 20:51:33

script

t:

Load

TransLineID,

TransID,

mod(TransID,26)+1 as Num,

Pick(Ceil(3*Rand1),'A','B','C') as Dim1,

Pick(Ceil(6*Rand1),'a','b','c','d','e','f') as Dim2,

Pick(Ceil(3*Rand()),'X','Y','Z') as Dim3,

Round(1000*Rand()*Rand()*Rand1) as Expression1,

Round( 10*Rand()*Rand()*Rand1) as Expression2,

Round(Rand()*Rand1,0.00001) as Expression3;

Load

Rand() as Rand1,

IterNo() as TransLineID,

RecNo() as TransID

Autogenerate 1000000

While Rand()<=0.5 or IterNo()=1;

store t into t.qvd (qvd);

drop table t;

// optimized

let t=now(); trace ***** read qvd optimized $(t);

t: load * from t.qvd (qvd);

let t=now(); trace $(t);

// resident

let t=now(); trace ***** read resident $(t);

tr: load *, 1 Resident t where Dim2 = 'a';

let t=now(); trace $(t);

drop table tr, t;

// qvd non optimized

let t=now(); trace ***** read qvd non optimized $(t);

tq: load *, 1 from t.qvd (qvd) where Dim2 = 'a';

let t=now(); trace $(t);

drop table tq;

Report Inappropriate Content · ‎2015-10-22

thanks guys.

i cannot use a flag for this instance.

my understanding is also the same i.e. use resident load.

the confusion comes from not finding any solid reference materials anywhere.

only way of confirming this is for me is to run this on the pre production server and check; which is a pain

JonasValleskog · ‎2015-10-22

The Optimized QVD load is faster as long as your I/O bandwidth is good. My machine has a solid state drive, so unlikely a fair test machine to compare with most servers out there today, but never the less, here are my results when running the well written script by maxgro with a few minor improvements.

Notes:

It is possible to achieve an OPTIMISED qvd load even with a filter condition, if the condition can be satisfied by a WHERE EXISTS() clause in where the input field has the same name as the field you are checking against. This can be used to your advantage here and without it, the test is not comparing like-for-like when considering the time lapse for an optimised QVD load. See below.

A tweaked script gives this output:

t << AUTOGENERATE(10000000) 10,000,000 lines fetched

***** read resident 22/10/2015 22:08:39

tr << t 1,665,914 lines fetched

00:00:04

Dim2 << INLF7C 1 lines fetched

***** read qvd optimized 22/10/2015 22:08:44

t << t (qvd optimized) 1,665,914 lines fetched

00:00:01

***** read qvd non optimized 22/10/2015 22:08:45

tq << t 1,665,914 lines fetched

00:00:05

Modified script

t:

Load

TransLineID,

TransID,

mod(TransID,26)+1 as Num,

Pick(Ceil(3*Rand1),'A','B','C') as Dim1,

Pick(Ceil(6*Rand1),'a','b','c','d','e','f') as Dim2,

Pick(Ceil(3*Rand()),'X','Y','Z') as Dim3,

Round(1000*Rand()*Rand()*Rand1) as Expression1,

Round( 10*Rand()*Rand()*Rand1) as Expression2,

Round(Rand()*Rand1,0.00001) as Expression3;

Load

Rand() as Rand1,

IterNo() as TransLineID,

RecNo() as TransID

Autogenerate 10000000

;

store t into t.qvd (qvd);

// resident

let t=now(); trace ***** read resident $(t);

tr: load *, 1 Resident t where Dim2 = 'a';

let t=time(now()-t); trace $(t);

drop table tr,t;

Dim2:

LOAD * INLINE [

Dim2

a

];

// optimized

let t=now(); trace ***** read qvd optimized $(t);

t: load * from t.qvd (qvd)

WHERE EXISTS(Dim2);

let t=time(now()-t); trace $(t);

DROP TABLE t;

// qvd non optimized

let t=now(); trace ***** read qvd non optimized $(t);

tq: load *, 1 from t.qvd (qvd) where Dim2 = 'a';

let t=time(now()-t);; trace $(t);

drop table tq;