Preceding Load - Page 9 - Qlik Community

hic · ‎2013-03-04

A QlikView feature that is poorly known and brilliant in its simplicity is the Preceding Load.

If you don’t know what it is, then I strongly suggest that you read this blog post and find out. Because it will help you in your QlikView scripting.

So what is it?

It is a way for you to define successive transformations and filters so that you can load a table in one pass but still have several transformation steps. Basically it is a Load statement that loads from the Load/SELECT statement below.

Example: you have a database where your dates are stored as strings and you want to use the QlikView date functions to interpret the strings. But the QlikView date functions are not available in the SELECT statement. The solution is to put a Load statement in front of the SELECT statement: (Note the absence of “From” or “Resident”.)

Load Date#(OrderDate,’YYYYMMDD’) as OrderDate;
SQL SELECT OrderDate FROM … ;

What happens then is that the SELECT statement is evaluated first, and the result is piped into the Load statement that does the date interpretation. The fact that the SELECT statement is evaluated before the Load, is at first glance confusing, but it is not so strange. If you read a Preceding Load as

Load From ( Select From ( DB_TABLE ) )

then it becomes clearer. Compare it with nested functions: How would you evaluate “Round( Exp( x ) )”. You would of course evaluate the Exp() function first and then the Round() function. That is, you evaluate it from right to left.

The reason is that the Exp() function is closest to the source data, and therefore should be evaluated first. It’s the same with the Preceding Load: The SELECT is closest to the source data and should therefore be evaluated first. In both cases, you can look at it as a transformation that has an input and an output and to do it correctly, you need to start with the part of the transformation closest to the input.

Any number of Loads can be “nested” this way. QlikView will start from the bottom and pipe record by record to the closest preceding Load, then to the next, etc. And it is almost always faster than running a second pass through the same table.

With preceding Load, you don’t need to have the same calculation in several places. For instance, instead of writing

Load ... ,
   Age( FromDate + IterNo() – 1, BirthDate ) as Age,
   Date( FromDate + IterNo() – 1 ) as ReferenceDate
   Resident Policies
      While IterNo() <= ToDate - FromDate + 1 ;

where the same calculation is made for both Age and ReferenceDate, I would in real life define my ReferenceDate only once and then use it in the Age function in a Preceding Load:

Load ..., ReferenceDate,
   Age( ReferenceDate, BirthDate ) as Age;
Load *,
   Date( FromDate + IterNo() – 1 ) as ReferenceDate
   Resident Policies
      While IterNo() <= ToDate - FromDate + 1 ;

The Preceding Load has no disadvantages. Use it. You’ll love it.

HIC

anderseriksson · ‎2017-08-30

A preceding load that only repeats the same field names adds nothing.
It is an extra step that takes some more time to process.
I regularly remove these if I have used the wizard and got them added.
Also when it comes to fields I regularly use the wildcard * instead of naming all the fields.
Granted it can be lots of fields but if it does not have significant impact on the load I think
it better having access to all database fields in the following Transform apps

and there limit to those fields used in the Load step.

markp201 · ‎2017-08-30

HIC - I have noticed every now and then the case-sensitive issue and tracked it down to a field being upper-cased without explicitly having this in the script. I'll review the ODBC drivers with the admin.

I have done the same Anders - remove the preceding load and add renames in the SQL. Much cleaner and easier to maintain. With regard to renames, we've found users are more comfortable with a field worded in business terms (instead of the cryptic but ever popular custno). We typically rename just before creating our dimensional model QVDs.

Thanks for the tips - very helpful.

lawrenceiow · ‎2017-09-07

Can I just ask for a little clarification here? Reading most of the above I understood that preceding loads work on single records at a time. So, a data source of 1000 records would load the first record, pass this on to the first preceding load for any transformations before loading the second of the 1000 records, etc. This is in contrast to a resident load which loads all 1000 records into RAM before the actual "LOAD Resident" part can do any transformations.

Resident method:

Step 1: Load 1000 records

Step 2: Load 1000 records again from step 1 (now resident in memory) adding new fields.

Preceding method:

Step 1: Load record 1 of 1000

Step 2: First preceding load part adds new field to this first record

Step 3: A second preceding load uses field created at step 2 to create another new field

Step 4: Store record 1, along with new fields, in memory

Step 5: Load record 2 of 1000

Step 6: First preceding load part adds new field to this second record

Step 7: A second preceding load uses field created at step 6 to create another new field

Step 8: Store record 2, along with new fields, in memory

Step 9: Load record 3 of 1000

etc

After final step: It is only now that all 1000 records are resident in RAM.

If I'm right, it would not be possible to perform any aggregations via the preceding load method as not all records are available during the load, yet the recent posts above seem to suggest you can. Have I got it wrong?

hic · ‎2017-09-07

Good point.

In principle, the Preceding load is a record-by-record evaluation where each record is piped into the next Load and completely evaluated before the next record is read. In principle.

But there are exceptions... A group by clause will break this rule, and force the entire table to be read before anything can be piped to the next Load.

Also, parallelization will cause several records to be evaluated simultaneously by different threads. (If the output of the Load doesn't depend on other records, through e.g. the Peek() function.)

HIC

lawrenceiow · ‎2017-09-07

I see, thank you.

Also, I'm glad you pointed out about the inter-record functions, I hadn't thought about those.

So, is parallelization a thing that only occurs during an optimized load or does it happen anyway - are there three load speeds - optimized, parallelized (?) and normal?

thanstad · ‎2017-09-08

Thanks Lawrance and Henric for making this more clear. I am using Preceding load a lot an this has angreat value.

Tormod Hanstad

Sendt fra min iPhone

7. sep. 2017 kl. 14:24 skrev Lawrence Cooper <qcwebmaster@qlikview.com<mailto:qcwebmaster@qlikview.com>>:

hic · ‎2017-09-08

There are even more alternatives...

Optimized - A qvd load where the records aren't unpacked

Full parallelization - A load where different records and different fields within the same record may be treated by different threads

"Partial" parallelization - A load where different fields within the same record may be treated by different threads

Single threaded - The load is performed in one thread only

Which one the engine uses depends on the circumstances: Are the fields transformed? (Then "optimized" is excluded). Does the load contain Peek() or Exists()? Then the second alternative is excluded.

The last alternative can only be achieved by using

EnableParallelReload=0

in Settings.ini

HIC

humansoft · ‎2017-09-08

hic‌, What about GROUP BY clause? Can it be calc multi-threaded or only single-threaded?

TH

hic · ‎2017-09-08

I believe that most of it is multi-threaded. In most cases (Sum(), Avg(), Max(), etc.) there should be no problem distributing different records on different threads since there is no dependency in the order of the threads. There could however be limitations when you calculate aggregates that depend on the sort order of the records, e.g. Median().

But to be 100% sure I need to ask the developers.... 🙂

HIC

humansoft · ‎2017-09-13

Henric Cronström napisał(-a):

I believe that most of it is multi-threaded. In most cases (Sum(), Avg(), Max(), etc.) there should be no problem distributing different records on different threads since there is no dependency in the order of the threads. There could however be limitations when you calculate aggregates that depend on the sort order of the records, e.g. Median().

But to be 100% sure I need to ask the developers.... 🙂

HIC

hic‌, I checked it: GROUP BY clause with Sum(). It's single-threaded operation in QV 12SR6 and QV 11.20SR16.

Sample QVW:

https://ufile.io/amnft

Can You ask the developers?

TH