Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I wanted to share a couple of test I did on trying to get loads to be as fast as possible and hopefully get some feedback or some other tips.
Sorry for the long post I believe it's worth reading through.
Test 1)
When concatenating two tables that don't have the same number of fields, if the second table has the same fields than the first one and then some extra fields the load will still be optimized, if done the other way around it will not be optimized.
e.g. All Loads are optimized
Table:
LOAD // Optimized
A,
B,
C
FROM Table1.qvd (qvd);
CONCATENATE(Table) // Optimized
LOAD
A,
B,
C,
D,
E
FROM Table2.qvd (qvd);
e.g. 2nd Load isn't optimized
Table:
LOAD // Optimized
A,
B,
C,
D,
E
FROM Table2.qvd (qvd);
CONCATENATE(Table) // Not optimized
LOAD
A,
B,
C
FROM Table1.qvd (qvd);
Test 2)
Second table has some fields in common with the first but is missing some, each table has 50 Million rows, 2nd load will not be optimized but loading the table in optimized mode, adding the missing fields, storing it and loading it again optimized will be faster than just concatenating the tables straight up.
e.g. 2nd load isnt optimized (in this example it took about 1 min to load. PC is Core i5 x64 4 GB running Windows 7).
R00:
LOAD ShipperID, // Optimized
OrderDate,
CustomerID,
UnitPrice,
sales,
COS
FROM
R00_1.QVD
(qvd);
Concatenate(R00) // Not Optimized
LOAD ShipperID,
CustomerID,
Discount,
ProductID,
Quantity,
UnitPrice
FROM
R00_2.QVD
(qvd);
If you load the 2nd table without concatenating it, add the missing fields store it and load it again to concatenate it while reading it optimized it will be faster (In my example took 50% of the time).
R00:
LOAD ShipperID, // Optimized
OrderDate,
CustomerID,
UnitPrice,
sales,
COS
FROM
R00_1.QVD
(qvd);
R000_Aux:
LOAD ShipperID, // Optimized
CustomerID,
Discount,
ProductID,
Quantity,
UnitPrice
FROM
R00_2.QVD
(qvd);
concatenate(R000_Aux) // Not Optimized. 0 records are added
LOAD null() as ShipperID,
null() as OrderDate,
null() as CustomerID,
null() as UnitPrice,
null() as sales,
null() as COS
autogenerate(0);
store R000_Aux into R000_Aux.QVD;
drop table R000_Aux;
concatenate(R00) //This load will now be optimized!
LOAD ShipperID,
OrderDate,
CustomerID,
UnitPrice,
sales,
Discount,
ProductID,
Quantity,
COS
FROM
R000_Aux.QVD
(qvd);
Hi there,
Just wanted to add here that I have recently blogged about Optimised QVD Loads, giving details of the scenarios in which QVD loads will be optimised and also why it is critical that your loads are Optimised:
- Steve
Hi Daniel,
That is an interesting concept. You do however need to take into account that storing a qvd to disk can be very slow. Since many servers are on SAN or use older disk based HD's.
Hi Torben, thanks for answering my 4 year old post .
Loading and storing a QVD might not be the best way to approach this, but you could add dummy fields in some QVDs at creation time just to improve performance on concatenations done later on. I recon this might not apply in any case and should probably not be taken as a general solution but we've been using this sort of hack to improve load performances on large volumes over the last years very succesfully. You could also have servers with small SSD drives.
I just tested this with QV12 Beta and it's still stands. Test2 2nd option takes 30% less time to run than 1st option.
Honestly I would have expected QlikView to pick up this sort of optimizations automatically without resorting to any file saving.