Qlik Community

Ask a Question

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Henric_Cronström

A QlikView feature that is poorly known and brilliant in its simplicity is the Preceding Load.

 

If you don’t know what it is, then I strongly suggest that you read this blog post and find out. Because it will help you in your QlikView scripting.

 

So what is it?

 

It is a way for you to define successive transformations and filters so that you can load a table in one pass but still have several transformation steps. Basically it is a Load statement that loads from the Load/SELECT statement below.

 

Example: you have a database where your dates are stored as strings and you want to use the QlikView date functions to interpret the strings. But the QlikView date functions are not available in the SELECT statement. The solution is to put a Load statement in front of the SELECT statement: (Note the absence of “From” or “Resident”.)

 

Load Date#(OrderDate,’YYYYMMDD’) as OrderDate;
SQL SELECT OrderDate FROM … ;

 

What happens then is that the SELECT statement is evaluated first, and the result is piped into the Load statement that does the date interpretation. The fact that the SELECT statement is evaluated before the Load, is at first glance confusing, but it is not so strange. If you read a Preceding Load as

 

     Load From ( Select From ( DB_TABLE ) )

 

then it becomes clearer. Compare it with nested functions: How would you evaluate “Round( Exp( x ) )”. You would of course evaluate the Exp() function first and then the Round() function. That is, you evaluate it from right to left.

 

Input - Output.png

 

The reason is that the Exp() function is closest to the source data, and therefore should be evaluated first. It’s the same with the Preceding Load: The SELECT is closest to the source data and should therefore be evaluated first. In both cases, you can look at it as a transformation that has an input and an output and to do it correctly, you need to start with the part of the transformation closest to the input.

 

Any number of Loads can be “nested” this way. QlikView will start from the bottom and pipe record by record to the closest preceding Load, then to the next, etc. And it is almost always faster than running a second pass through the same table.

 

With preceding Load, you don’t need to have the same calculation in several places. For instance, instead of writing

 

Load  ... ,
   Age( FromDate + IterNo() – 1, BirthDate ) as Age,
   Date( FromDate + IterNo() – 1 ) as ReferenceDate
   Resident Policies
      While IterNo() <= ToDate - FromDate + 1 ;

 

where the same calculation is made for both Age and ReferenceDate, I would in real life define my ReferenceDate only once and then use it in the Age function in a Preceding Load:

 

Load  ..., ReferenceDate,
   Age( ReferenceDate, BirthDate ) as Age;
Load  *,
   Date( FromDate + IterNo() – 1 ) as ReferenceDate
   Resident Policies
      While IterNo() <= ToDate - FromDate + 1 ;

 

The Preceding Load has no disadvantages. Use it. You’ll love it.

 

HIC

96 Comments
anderseriksson
Partner
Partner

A preceding load that only repeats the same field names adds nothing.
It is an extra step that takes some more time to process.
I regularly remove these if I have used the wizard and got them added.
Also when it comes to fields I regularly use the wildcard * instead of naming all the fields.
Granted it can be lots of fields but if it does not have significant impact on the load I think
it better having access to all database fields in the following Transform apps

and there limit to those fields used in the Load step.

379 Views
markp201
Creator III
Creator III

HIC - I have noticed every now and then the case-sensitive issue and tracked it down to a field being upper-cased without explicitly having this in the script.  I'll review the ODBC drivers with the admin. 

I have done the same Anders - remove the preceding load and add renames in the SQL.  Much cleaner and easier to maintain.  With regard to renames, we've found users are more comfortable with a field worded in business terms (instead of the cryptic but ever popular custno).  We typically rename just before creating our dimensional model QVDs.

Thanks for the tips - very helpful.

0 Likes
379 Views
lawrenceiow
Creator II
Creator II

Can I just ask for a little clarification here? Reading most of the above I understood that preceding loads work on single records at a time. So, a data source of 1000 records would load the first record, pass this on to the first preceding load for any transformations before loading the second of the 1000 records, etc. This is in contrast to a resident load which loads all 1000 records into RAM before the actual "LOAD Resident" part can do any transformations.

Resident method:

Step 1: Load 1000 records

Step 2: Load 1000 records again from step 1 (now resident in memory) adding new fields.

Preceding method:

Step 1: Load record 1 of 1000

Step 2: First preceding load part adds new field to this first record

Step 3: A second preceding load uses field created at step 2 to create another new field

Step 4: Store record 1, along with new fields, in memory

Step 5: Load record 2 of 1000

Step 6: First preceding load part adds new field to this second record

Step 7: A second preceding load uses field created at step 6 to create another new field

Step 8: Store record 2, along with new fields, in memory

Step 9: Load record 3 of 1000

etc

After final step: It is only now that all 1000 records are resident in RAM.

If I'm right, it would not be possible to perform any aggregations via the preceding load method as not all records are available during the load, yet the recent posts above seem to suggest you can. Have I got it wrong?

0 Likes
379 Views
Henric_Cronström

Good point.

In principle, the Preceding load is a record-by-record evaluation where each record is piped into the next Load and completely evaluated before the next record is read. In principle.

But there are exceptions... A group by clause will break this rule, and force the entire table to be read before anything can be piped to the next Load.

Also, parallelization will cause several records to be evaluated simultaneously by different threads. (If the output of the Load doesn't depend on other records, through e.g. the Peek() function.)

HIC

379 Views
lawrenceiow
Creator II
Creator II

I see, thank you.

Also, I'm glad you pointed out about the inter-record functions, I hadn't thought about those.

So, is parallelization a thing that only occurs during an optimized load or does it happen anyway - are there three load speeds - optimized, parallelized (?) and normal?

0 Likes
426 Views
thanstad
Creator
Creator

Thanks Lawrance and Henric for making this more clear. I am using Preceding load a lot an this has angreat value.

Tormod Hanstad

Sendt fra min iPhone

7. sep. 2017 kl. 14:24 skrev Lawrence Cooper <qcwebmaster@qlikview.com<mailto:qcwebmaster@qlikview.com>>:

0 Likes
426 Views
Henric_Cronström

There are even more alternatives...

Optimized - A qvd load where the records aren't unpacked

Full parallelization - A load where different records and different fields within the same record may be treated by different threads

"Partial" parallelization - A load where different fields within the same record may be treated by different threads

Single threaded - The load is performed in one thread only


Which one the engine uses depends on the circumstances: Are the fields transformed? (Then "optimized" is excluded). Does the load contain Peek() or Exists()? Then the second alternative is excluded.


The last alternative can only be achieved by using

EnableParallelReload=0

in Settings.ini


HIC

426 Views
humansoft
Partner
Partner

hic‌, What about GROUP BY clause? Can it be calc multi-threaded or only single-threaded?

TH

0 Likes
426 Views
Henric_Cronström

I believe that most of it is multi-threaded. In most cases (Sum(), Avg(), Max(), etc.) there should be no problem distributing different records on different threads since there is no dependency in the order of the threads. There could however be limitations when you calculate aggregates that depend on the sort order of the records, e.g. Median().

But to be 100% sure I need to ask the developers.... 🙂

HIC

426 Views
humansoft
Partner
Partner

Henric Cronström napisał(-a):

I believe that most of it is multi-threaded. In most cases (Sum(), Avg(), Max(), etc.) there should be no problem distributing different records on different threads since there is no dependency in the order of the threads. There could however be limitations when you calculate aggregates that depend on the sort order of the records, e.g. Median().

But to be 100% sure I need to ask the developers.... 🙂

HIC

hic‌, I checked it: GROUP BY clause with Sum(). It's single-threaded operation in QV 12SR6 and QV 11.20SR16.

Sample QVW:

https://ufile.io/amnft

Can You ask the developers?

TH

0 Likes
426 Views