Preceding Load - Page 4 - Qlik Community

hic · ‎2013-03-04

A QlikView feature that is poorly known and brilliant in its simplicity is the Preceding Load.

If you don’t know what it is, then I strongly suggest that you read this blog post and find out. Because it will help you in your QlikView scripting.

So what is it?

It is a way for you to define successive transformations and filters so that you can load a table in one pass but still have several transformation steps. Basically it is a Load statement that loads from the Load/SELECT statement below.

Example: you have a database where your dates are stored as strings and you want to use the QlikView date functions to interpret the strings. But the QlikView date functions are not available in the SELECT statement. The solution is to put a Load statement in front of the SELECT statement: (Note the absence of “From” or “Resident”.)

Load Date#(OrderDate,’YYYYMMDD’) as OrderDate;
SQL SELECT OrderDate FROM … ;

What happens then is that the SELECT statement is evaluated first, and the result is piped into the Load statement that does the date interpretation. The fact that the SELECT statement is evaluated before the Load, is at first glance confusing, but it is not so strange. If you read a Preceding Load as

Load From ( Select From ( DB_TABLE ) )

then it becomes clearer. Compare it with nested functions: How would you evaluate “Round( Exp( x ) )”. You would of course evaluate the Exp() function first and then the Round() function. That is, you evaluate it from right to left.

The reason is that the Exp() function is closest to the source data, and therefore should be evaluated first. It’s the same with the Preceding Load: The SELECT is closest to the source data and should therefore be evaluated first. In both cases, you can look at it as a transformation that has an input and an output and to do it correctly, you need to start with the part of the transformation closest to the input.

Any number of Loads can be “nested” this way. QlikView will start from the bottom and pipe record by record to the closest preceding Load, then to the next, etc. And it is almost always faster than running a second pass through the same table.

With preceding Load, you don’t need to have the same calculation in several places. For instance, instead of writing

Load ... ,
   Age( FromDate + IterNo() – 1, BirthDate ) as Age,
   Date( FromDate + IterNo() – 1 ) as ReferenceDate
   Resident Policies
      While IterNo() <= ToDate - FromDate + 1 ;

where the same calculation is made for both Age and ReferenceDate, I would in real life define my ReferenceDate only once and then use it in the Age function in a Preceding Load:

Load ..., ReferenceDate,
   Age( ReferenceDate, BirthDate ) as Age;
Load *,
   Date( FromDate + IterNo() – 1 ) as ReferenceDate
   Resident Policies
      While IterNo() <= ToDate - FromDate + 1 ;

The Preceding Load has no disadvantages. Use it. You’ll love it.

HIC

stevedark · ‎2014-02-02

Is that code in your message the actual code you are running? If so I'm very surprised it doesn't error with the fact in both loads you are pulling uniqkey back twice (once pre-defined and once built). Also, the QVD you store from the first load (new_updatedata.qvd) doesn't match that of the second (new_masterdata.qvd).

I assume that this is not the problem though, based on your description of the error.

What I think you are seeing is a result of auto concatenate. You are loading the first table and calling it updatedata. When you load MasterData you have exactly the same columns. Because of this an auto concatenate will happen. This will result in instead of two tables in your data model, you will have one table called updatedata. This is why when you come to store MasterData it does not exist.

Creating the key on load from the QVD is kind of missing the point of doing it in a prior pass - as at this point you will be causing a non-optimised load anyway. Ideally you would create the composite key when loading from the underlying data source - rather than from a QVD.

Hopefully that makes sense? If you Google some of these terms you will find out more.

- Steve

userid128223 · ‎2014-02-02

Thanks Steve

It was auto concatenating by default. I forced NoConcatenate statement between the 2 qvd generate and it worked.

stevedark · ‎2014-02-02

Hi Mikael,

There are lots of good references online. Henric's posts on the Business Discovery blog are always worth keeping an eye on, and I do various posts to my own blog. Recently I have been doing a number of QlikView basics posts: http://www.quickintelligence.co.uk/back-to-basics/ - I should be adding to these soon.

Google is always good at turning up helpful articles (largely due to the massive number of posts on QlikCommunity).

If you want a printed (or e-Book) reference I can highly recommend the book QlikView 11 For Developers.

Hope that helps,

Steve

http://www.quickintelligence.co.uk/qlikview-blog/

Report Inappropriate Content · ‎2014-02-12

Hi Henric,

Have experienced a scenario wherein the Resident Load has actually performed better than a preceding load. To give you an overview of the scenario. I am loading a fact with ~25M records with ~40 fields. There is need to create a composite key using 8 fields; which I tried in the preceding load. (In the prior load there are some string transformations on fields; hence did not want to create the composite in the same load statement). The preceding load took ~16 mins.

Similar transformation using a Resident load took ~ 4 mins. (The load of the original fact takes 2 mins).

Now, what do you think is the reason for the Preceding Load taking longer? Or am I missing something and using the preceding load incorrectly.

Thanks,

SK

hic · ‎2014-02-12

I am somewhat surprised - but it sounds as if you know what you're doing. String operations are cpu-expensive and that in combination with something else your doing, could be slowing down the preceding load. Could you mail me more information about what your doing in the script?

I will probably need to remove an "always" from the blog post...

HIC

stevedark · ‎2014-02-17

Hi Henric / SK,

Could it be that adding the Preceding Load to the load from QVD caused it to be non-optimized? Loading the data from the QVD as an optimized load and then doing a resident load to manipulate the data afterwards could well be quicker than a preceding load on the QVD?

In these cases I would always advocate doing any manipulation when creating the QVD rather than when loading from it (as per my blog post on Optimised QVD Loads).

-Steve

Report Inappropriate Content · ‎2014-02-17

Hey Steve,

Thanks for your input.

Your feedback gives me a way to go ahead and test it out. I will remove the string transformations from the current app (Will incorprate those while creating the qvd or add an intermediate layer) and then run the whole scenario.

SK

hic · ‎2014-02-17

Steve

I think you found the reason for this behavior...

HIC

Report Inappropriate Content · ‎2014-02-18

Hi Steve Dark, Henric Cronström,

Based on the input from Steve, I have tested out the below 3 scenarios. Each scenario uses an optimised load in the initial load. The number of records is the same ie ~ 24M. The number of fields are ~40. Below is the table that details 3 scenarios.

Now, if you observe, as I moved from Scenario 1 to 3, I reduced the complexity of operation in the Resident / Preceding load (with the hope that this should improve the Preceding Load performance over the Resident).

But, surprisingly, it seems to get better (as observed in the ratio of Resident time to Preceding.) as the complexity decreases. Now, I am a little confused as to -

1.) when exactly would the preceding perform better than Resident?

2) Is this because of large number of records or fields? Would that have any bearing?

3) Or should be including some value transformations as well & those probably will have a different output? (the above tests being primarily string operations)

Rgds,

SK

Note - The times are in minutes.seconds. The last scenario has only a test text column added in the data transformation ('Test' as Dummy).

datanibbler · ‎2014-02-18

Great post!

I use it a lot - with SQL_SELECT, I routinely do that. It makes it much more flexible.

With "normal" LOADs, you can also use it - but I'm a bit sceptical about that one:

=> One of the primary objectives of programming_work in an enterprise_context should be, others should be able, without more than general IT_knowledge, to understand your code if you're not there for whatever reason.

=> A RESIDENT LOAD, although slower, is just a lot easier to understand. That way, that "other one" can just read it in traditional, top-down order, and understand step_by_step, what is happening - well, in most cases anyway. With complex commands, I always insert one or two lines of comment. It's a bit time-consuming, but well worth it IMHO.

Best regards,

DataNibbler