Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a huge table vente (sales) that contains over than 1 billion lines
I have the following script to load , it takes forever without no end , How to optimize it ?
Vente:
LOAD PrixUnitaireArticle,
PrixArticleHT,
PrixRevientTTC,
MargeHT,
MargeTTC,
ValeurRevientHT,
ValeurRevientTTC,
ValeurMargeHT,
ValeurMargeTTC,
PrixVenteHT,
N°tickets,
EAN,
[Code Caisse],
[Code Site],
[Prix de vente TTC],
Quantité,
[Date Time],
Promo,
[Code Article],
[Code Structure],
Alimentaire,
[Prix de Revient],
TVA,
DivKey,
ID,
DateOfHire,
nbrClient,
KeyVenteSite,
code4 ,
datecol,
Date(Date#(Date,'YYYY-MM-DD'),'YYYYMMDD') as Date ,
Date#(Date,'YYYY-MM-DD') as DateTest ,
ApplyMap('MapComp', datecol&[Code Site]&code4 , 'O') as Comp,
ApplyMap('MapCasse', [Code Site]&[Code Article]&Date(Date#(Date,'YYYY-MM-DD'),'YYYYMMDD') ,0) as QuantiteCasse,
ApplyMap('MapAjust', [Code Site]&[Code Article]&Date(Date#(Date,'YYYY-MM-DD'),'YYYYMMDD') ,0) as QuantiteAjust
FROM
E:\QVD\QVD\VenteFull.qvd
(qvd)
Where (isnull([Code Article]) = 0) and (isnull([Code Structure]) = 0) and (isnull([Prix de Revient])=0);
LEFT JOIN(Vente)
IntervalMatch (datecol)
LOAD DISTINCT [DATE DEBUT]
,[DATE FIN]
resident OC
;
LEFT JOIN (Vente)
LOAD *
RESIDENT OC;
LEFT JOIN(Vente)
IntervalMatch (datecol, [Code Site], [Code Article])
LOAD DISTINCT DateDebutPermanent, DateFinPermanent, [Code Site], [Code Article]
resident Permanent
;
LEFT JOIN (Vente)
LOAD *
RESIDENT Permanent;
LEFT JOIN(Vente)
IntervalMatch (datecol,[Code Article])
LOAD DISTINCT DateDebutRemisePromo,DateFinRemisePromo,[Code Article]
resident RemisePromo
;
LEFT JOIN (Vente)
LOAD *
RESIDENT RemisePromo;
LEFT JOIN(Vente)
IntervalMatch (datecol,[Code Article])
LOAD DISTINCT DateDebutPermanentNational, DateFinPermanentNational,[Code Article]
resident PermanentNational
;
LEFT JOIN (Vente)
LOAD *
RESIDENT PermanentNational;
LEFT JOIN(Vente)
IntervalMatch (datecol,[Code Article])
LOAD DISTINCT DateDebutRemiseHabituelle,DateFinRemiseHabituelle,[Code Article]
resident RemiseHabituelle
;
LEFT JOIN (Vente)
LOAD *
RESIDENT RemiseHabituelle;
LEFT JOIN(Vente)
IntervalMatch (datecol,[Code Article])
LOAD DISTINCT DateDebutRemiseHabituellePromo, DateFinRemiseHabituellePromo,[Code Article]
resident RemiseHabituellePromo
;
LEFT JOIN (Vente)
LOAD *
RESIDENT RemiseHabituellePromo;
STORE Vente into 'E:\QVD\QVD\Vente.qvd';
Drop Table Vente;
jst chk dis..
IMHO the only way to manage a load of a billion table is an incremental load ; you have an ID in the table, is it the table PK?
Also if you have a date try to segment (split) your load by perhaps month and store by month. Many little qvd are simpler to manage that a big one.
First I would remove the transformations you do while reading the table. If you do not change any field then it will read optimized form disk. This is MUCH faster.
Try to read the table using load * first without any whereclauses, and then read it resident while you do the transformation steps and apply the where clause.
Secondly:
You are doing a lot of intervalmatches on a very big table, but your only keys are datecol, [Code Article] and [Code Site]. You can try to make this faster by reducing the size of the tables. Try reading all combinations of datecol , [Code Article] and [Code Site] into a temporary table. Then do all your intervalmatches and only do the joins in the end.
Noconcatenate:
TempBridgeTable.
Load distinct datecol as Bridgedatecol, [Code Article], [Code Site]
resident Vente;
left join(TempBridgeTable)
IntervalMatch (Bridgedatecol)
LOAD DISTINCT [DATE DEBUT]
,[DATE FIN]
resident OC
;
left join(TempBridgeTable)
IntervalMatch (Bridgedatecol, [Code Site], [Code Article])
LOAD DISTINCT DateDebutPermanent, DateFinPermanent, [Code Site], [Code Article]
resident Permanent
;
.
.
.
.
.
left join(Vente)
load *, Bridgedatecol as datecol resident TempBridgeTable;
drop table TempBridgeTable;
left join(Vente)
load * resident OC;
Left join(Vente)
load * resident Permanent;
Lastly i would consider whether you need all the fields in all tables. When the tables reach a billion rows each field will need a billion pointers.