Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All
I have a Qlikview project where a Table A QVD contains about 1.5 million rows. i have already loaded this QVD into the memory.
I want to load another table Table B which is a subset of Table A using a where clause.
e.g.
Table B:
Load
AID as BID,
.....
resident load Table A
where Purchase Year = Last Purchase Year
In such a case is an Non Optimized QVD Load (because of where clause) better or a Resident Load?
Any help from your experience appreciated
The Optimized QVD load is faster as long as your I/O bandwidth is good. My machine has a solid state drive, so unlikely a fair test machine to compare with most servers out there today, but never the less, here are my results when running the well written script by maxgro with a few minor improvements.
Notes:
It is possible to achieve an OPTIMISED qvd load even with a filter condition, if the condition can be satisfied by a WHERE EXISTS() clause in where the input field has the same name as the field you are checking against. This can be used to your advantage here and without it, the test is not comparing like-for-like when considering the time lapse for an optimised QVD load. See below.
A tweaked script gives this output:
t << AUTOGENERATE(10000000) 10,000,000 lines fetched
***** read resident 22/10/2015 22:08:39
tr << t 1,665,914 lines fetched
00:00:04
Dim2 << INLF7C 1 lines fetched
***** read qvd optimized 22/10/2015 22:08:44
t << t (qvd optimized) 1,665,914 lines fetched
00:00:01
***** read qvd non optimized 22/10/2015 22:08:45
tq << t 1,665,914 lines fetched
00:00:05
Modified script
t:
Load
TransLineID,
TransID,
mod(TransID,26)+1 as Num,
Pick(Ceil(3*Rand1),'A','B','C') as Dim1,
Pick(Ceil(6*Rand1),'a','b','c','d','e','f') as Dim2,
Pick(Ceil(3*Rand()),'X','Y','Z') as Dim3,
Round(1000*Rand()*Rand()*Rand1) as Expression1,
Round( 10*Rand()*Rand()*Rand1) as Expression2,
Round(Rand()*Rand1,0.00001) as Expression3;
Load
Rand() as Rand1,
IterNo() as TransLineID,
RecNo() as TransID
Autogenerate 10000000
;
store t into t.qvd (qvd);
// resident
let t=now(); trace ***** read resident $(t);
tr: load *, 1 Resident t where Dim2 = 'a';
let t=time(now()-t); trace $(t);
drop table tr,t;
Dim2:
LOAD * INLINE [
Dim2
a
];
// optimized
let t=now(); trace ***** read qvd optimized $(t);
t: load * from t.qvd (qvd)
WHERE EXISTS(Dim2);
let t=time(now()-t); trace $(t);
DROP TABLE t;
// qvd non optimized
let t=now(); trace ***** read qvd non optimized $(t);
tq: load *, 1 from t.qvd (qvd) where Dim2 = 'a';
let t=time(now()-t);; trace $(t);
drop table tq;
If you can, I would put a flag on that table saying 1 as _LastYearFlag and use that to denote last year's data. That way you don't have to reload and duplicate data.
If you must, doing a resident load would be better performance wise.
In this case I think I'll use a resident load
but
I would not worry about the diffference between resident and not optimized for a 1,5 million file
the actual QVD is about 1.7 gig in size. 200+ columns
Hence my question.
It will not matter a much. Rather than loading a qvd again take a resident load. Just to understand why you want to reload the data again. You can play with the loaded data for any year..
Imran K
Depending on the number of columns in the table... if the table has 100 columns, this could be a different story.
I'd recommend a flag for current year then. If not, then a resident load would be best.
this is the result of a small test, you find the script at the end
optimized, non optimized, resident, the elapsed is 1 or 2 second
result
t << AUTOGENERATE(1000000) 1.999.146 lines fetched
***** read qvd optimized 22/10/2015 20:51:30
t << T (qvd optimized) 1.999.146 lines fetched
22/10/2015 20:51:31
***** read resident 22/10/2015 20:51:31
tr << t 333.454 lines fetched
22/10/2015 20:51:32
***** read qvd non optimized 22/10/2015 20:51:32
tq << T 333.454 lines fetched
22/10/2015 20:51:33
script
t:
Load
TransLineID,
TransID,
mod(TransID,26)+1 as Num,
Pick(Ceil(3*Rand1),'A','B','C') as Dim1,
Pick(Ceil(6*Rand1),'a','b','c','d','e','f') as Dim2,
Pick(Ceil(3*Rand()),'X','Y','Z') as Dim3,
Round(1000*Rand()*Rand()*Rand1) as Expression1,
Round( 10*Rand()*Rand()*Rand1) as Expression2,
Round(Rand()*Rand1,0.00001) as Expression3;
Load
Rand() as Rand1,
IterNo() as TransLineID,
RecNo() as TransID
Autogenerate 1000000
While Rand()<=0.5 or IterNo()=1;
store t into t.qvd (qvd);
drop table t;
// optimized
let t=now(); trace ***** read qvd optimized $(t);
t: load * from t.qvd (qvd);
let t=now(); trace $(t);
// resident
let t=now(); trace ***** read resident $(t);
tr: load *, 1 Resident t where Dim2 = 'a';
let t=now(); trace $(t);
drop table tr, t;
// qvd non optimized
let t=now(); trace ***** read qvd non optimized $(t);
tq: load *, 1 from t.qvd (qvd) where Dim2 = 'a';
let t=now(); trace $(t);
drop table tq;
thanks guys.
i cannot use a flag for this instance.
my understanding is also the same i.e. use resident load.
the confusion comes from not finding any solid reference materials anywhere.
only way of confirming this is for me is to run this on the pre production server and check; which is a pain
The Optimized QVD load is faster as long as your I/O bandwidth is good. My machine has a solid state drive, so unlikely a fair test machine to compare with most servers out there today, but never the less, here are my results when running the well written script by maxgro with a few minor improvements.
Notes:
It is possible to achieve an OPTIMISED qvd load even with a filter condition, if the condition can be satisfied by a WHERE EXISTS() clause in where the input field has the same name as the field you are checking against. This can be used to your advantage here and without it, the test is not comparing like-for-like when considering the time lapse for an optimised QVD load. See below.
A tweaked script gives this output:
t << AUTOGENERATE(10000000) 10,000,000 lines fetched
***** read resident 22/10/2015 22:08:39
tr << t 1,665,914 lines fetched
00:00:04
Dim2 << INLF7C 1 lines fetched
***** read qvd optimized 22/10/2015 22:08:44
t << t (qvd optimized) 1,665,914 lines fetched
00:00:01
***** read qvd non optimized 22/10/2015 22:08:45
tq << t 1,665,914 lines fetched
00:00:05
Modified script
t:
Load
TransLineID,
TransID,
mod(TransID,26)+1 as Num,
Pick(Ceil(3*Rand1),'A','B','C') as Dim1,
Pick(Ceil(6*Rand1),'a','b','c','d','e','f') as Dim2,
Pick(Ceil(3*Rand()),'X','Y','Z') as Dim3,
Round(1000*Rand()*Rand()*Rand1) as Expression1,
Round( 10*Rand()*Rand()*Rand1) as Expression2,
Round(Rand()*Rand1,0.00001) as Expression3;
Load
Rand() as Rand1,
IterNo() as TransLineID,
RecNo() as TransID
Autogenerate 10000000
;
store t into t.qvd (qvd);
// resident
let t=now(); trace ***** read resident $(t);
tr: load *, 1 Resident t where Dim2 = 'a';
let t=time(now()-t); trace $(t);
drop table tr,t;
Dim2:
LOAD * INLINE [
Dim2
a
];
// optimized
let t=now(); trace ***** read qvd optimized $(t);
t: load * from t.qvd (qvd)
WHERE EXISTS(Dim2);
let t=time(now()-t); trace $(t);
DROP TABLE t;
// qvd non optimized
let t=now(); trace ***** read qvd non optimized $(t);
tq: load *, 1 from t.qvd (qvd) where Dim2 = 'a';
let t=time(now()-t);; trace $(t);
drop table tq;