Skip to main content
Announcements
See why Qlik was named a Leader in the 2024 Gartner® Magic Quadrant™ for Data Integration Tools for the ninth year in a row: Get the report
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

optimize load script

I have a huge table vente (sales) that contains over than 1 billion lines

I have the following script to load , it takes forever without no end , How to optimize it ?

Vente: 

LOAD PrixUnitaireArticle,

     PrixArticleHT,

     PrixRevientTTC,

     MargeHT,

     MargeTTC,

     ValeurRevientHT,

     ValeurRevientTTC,

     ValeurMargeHT,

     ValeurMargeTTC,

     PrixVenteHT,

     N°tickets,

     EAN,

     [Code Caisse],

     [Code Site],

     [Prix de vente TTC],

     Quantité,

     [Date Time],

     Promo,

     [Code Article],

     [Code Structure],

     Alimentaire,

     [Prix de Revient],

     TVA,

     DivKey,

     ID,

     DateOfHire,

     nbrClient,

     KeyVenteSite,

     code4 ,

     datecol,

     Date(Date#(Date,'YYYY-MM-DD'),'YYYYMMDD') as Date ,

     Date#(Date,'YYYY-MM-DD') as DateTest ,

     ApplyMap('MapComp', datecol&[Code Site]&code4 , 'O') as Comp,

     ApplyMap('MapCasse', [Code Site]&[Code Article]&Date(Date#(Date,'YYYY-MM-DD'),'YYYYMMDD')  ,0) as QuantiteCasse,

     ApplyMap('MapAjust', [Code Site]&[Code Article]&Date(Date#(Date,'YYYY-MM-DD'),'YYYYMMDD')  ,0) as QuantiteAjust

FROM

E:\QVD\QVD\VenteFull.qvd

(qvd)

Where (isnull([Code Article]) = 0) and (isnull([Code Structure]) = 0) and (isnull([Prix de Revient])=0);

LEFT  JOIN(Vente)

IntervalMatch (datecol)

LOAD DISTINCT [DATE DEBUT]

      ,[DATE FIN]

resident      OC            

      ;

      LEFT JOIN (Vente)

LOAD *

RESIDENT OC;

LEFT  JOIN(Vente)

IntervalMatch (datecol, [Code Site], [Code Article])

LOAD DISTINCT DateDebutPermanent, DateFinPermanent, [Code Site], [Code Article]

resident      Permanent

      ;

       LEFT JOIN (Vente)

LOAD *

RESIDENT Permanent;

LEFT  JOIN(Vente)

IntervalMatch (datecol,[Code Article])

LOAD DISTINCT DateDebutRemisePromo,DateFinRemisePromo,[Code Article]

resident      RemisePromo

      ;

        LEFT JOIN (Vente)

LOAD *

RESIDENT RemisePromo;

LEFT  JOIN(Vente)

IntervalMatch (datecol,[Code Article])

LOAD DISTINCT DateDebutPermanentNational, DateFinPermanentNational,[Code Article]

resident      PermanentNational

      ;

        LEFT JOIN (Vente)

LOAD *

RESIDENT PermanentNational;

LEFT  JOIN(Vente)

IntervalMatch (datecol,[Code Article])

LOAD DISTINCT DateDebutRemiseHabituelle,DateFinRemiseHabituelle,[Code Article]

resident      RemiseHabituelle

      ;

         LEFT JOIN (Vente)

LOAD *

RESIDENT RemiseHabituelle;

LEFT  JOIN(Vente)

IntervalMatch (datecol,[Code Article])

LOAD DISTINCT DateDebutRemiseHabituellePromo, DateFinRemiseHabituellePromo,[Code Article]

resident      RemiseHabituellePromo

      ;

          LEFT JOIN (Vente)

LOAD *

RESIDENT RemiseHabituellePromo;

STORE Vente into 'E:\QVD\QVD\Vente.qvd';

Drop Table Vente;

3 Replies
Chanty4u
MVP
MVP

maxgro
MVP
MVP

IMHO the only way to manage a load of a billion table is an incremental load ; you have an ID in the table, is it the table PK?

Also if you have a date try to segment (split) your load by perhaps month and store by month. Many little qvd are simpler to manage that a big one.

simenkg
Specialist
Specialist

First I would remove the transformations you do while reading the table. If you do not change any field then it will read optimized form disk. This is MUCH faster.


Try to read the table using load * first without any whereclauses, and then read it resident while you do the transformation steps and apply the where clause.

Secondly:

You are doing a lot of intervalmatches on a very big table, but your only keys are datecol, [Code Article] and [Code Site]. You can try to make this faster by reducing the size of the tables. Try reading all combinations of datecol , [Code Article] and [Code Site] into a temporary table. Then do all your intervalmatches and only do the joins in the end.

Noconcatenate:
TempBridgeTable.
Load distinct datecol as Bridgedatecol, [Code Article], [Code Site]

resident Vente;

left join(TempBridgeTable)

IntervalMatch (Bridgedatecol) 

LOAD DISTINCT [DATE DEBUT] 

      ,[DATE FIN] 

resident      OC              

     ; 

left join(TempBridgeTable)

IntervalMatch (Bridgedatecol, [Code Site], [Code Article]) 

LOAD DISTINCT DateDebutPermanent, DateFinPermanent, [Code Site], [Code Article] 

resident      Permanent 

      ;

.

.

.

.

.

left join(Vente)

load *, Bridgedatecol as datecol  resident TempBridgeTable;

drop table TempBridgeTable;

left join(Vente)
load * resident OC;

Left join(Vente)

load * resident Permanent;

Lastly i would consider whether you need all the fields in all tables. When the tables reach a billion rows each field will need a billion pointers.