Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

A QlikView feature that is poorly known and brilliant in its simplicity is the Preceding Load.

 

If you don’t know what it is, then I strongly suggest that you read this blog post and find out. Because it will help you in your QlikView scripting.

 

So what is it?

 

It is a way for you to define successive transformations and filters so that you can load a table in one pass but still have several transformation steps. Basically it is a Load statement that loads from the Load/SELECT statement below.

 

Example: you have a database where your dates are stored as strings and you want to use the QlikView date functions to interpret the strings. But the QlikView date functions are not available in the SELECT statement. The solution is to put a Load statement in front of the SELECT statement: (Note the absence of “From” or “Resident”.)

 

Load Date#(OrderDate,’YYYYMMDD’) as OrderDate;
SQL SELECT OrderDate FROM … ;

 

What happens then is that the SELECT statement is evaluated first, and the result is piped into the Load statement that does the date interpretation. The fact that the SELECT statement is evaluated before the Load, is at first glance confusing, but it is not so strange. If you read a Preceding Load as

 

     Load From ( Select From ( DB_TABLE ) )

 

then it becomes clearer. Compare it with nested functions: How would you evaluate “Round( Exp( x ) )”. You would of course evaluate the Exp() function first and then the Round() function. That is, you evaluate it from right to left.

 

Input - Output.png

 

The reason is that the Exp() function is closest to the source data, and therefore should be evaluated first. It’s the same with the Preceding Load: The SELECT is closest to the source data and should therefore be evaluated first. In both cases, you can look at it as a transformation that has an input and an output and to do it correctly, you need to start with the part of the transformation closest to the input.

 

Any number of Loads can be “nested” this way. QlikView will start from the bottom and pipe record by record to the closest preceding Load, then to the next, etc. And it is almost always faster than running a second pass through the same table.

 

With preceding Load, you don’t need to have the same calculation in several places. For instance, instead of writing

 

Load  ... ,
   Age( FromDate + IterNo() – 1, BirthDate ) as Age,
   Date( FromDate + IterNo() – 1 ) as ReferenceDate
   Resident Policies
      While IterNo() <= ToDate - FromDate + 1 ;

 

where the same calculation is made for both Age and ReferenceDate, I would in real life define my ReferenceDate only once and then use it in the Age function in a Preceding Load:

 

Load  ..., ReferenceDate,
   Age( ReferenceDate, BirthDate ) as Age;
Load  *,
   Date( FromDate + IterNo() – 1 ) as ReferenceDate
   Resident Policies
      While IterNo() <= ToDate - FromDate + 1 ;

 

The Preceding Load has no disadvantages. Use it. You’ll love it.

 

HIC

96 Comments
Contributor

Thanks. I have the issue resolved.

270 Views
Not applicable

Thank you!

0 Likes
270 Views
Contributor

Dear HIC,

1) So in case of Preceding load how the flow of execution happens ? Is it Top to Bottom or Bottom to Top approach ?

2) In general when we load data from external sources it stored into RAM but in case of preceding load data gets loaded from external sources but not stored into RAM. It further processed and in the top most i.e the final stage it occupies RAM. Am I correct sir ?

0 Likes
270 Views
Contributor

Dear sir,

I tried applying Order By statement on top of a initial load. But it throws an error. I think Order by isn't possible in Preceding Load. Am I right sir ?

0 Likes
270 Views

The "Preceding Load" is effectively executed from bottom to top.

Data is always stored in RAM. But the point is, that if you have one or several preceding loads, only the output of the top Load is accumulated - it is appended to a table. The other Load statements just process a record and then pass it on to the Load above.

Order By can only be used in a "Load * Resident".

HIC

0 Likes
270 Views
Contributor

Thanks for the quick reply sir. Its more clear now.

0 Likes
270 Views
Honored Contributor

Thanks for sharing

0 Likes
270 Views
New Contributor

Thanks Henric. It was very helpful to understand the 'preceding load'

0 Likes
270 Views
Contributor III

Hello Henric

In one of your above replies you have mentioned that preceding load cannot be used above joins or concatenate. In most of the situations that I have encountered I have used joins or concatenated load statements. Hence I have used resident load everywhere to do the transformations. Will it be possible to share an example or a workaround how preceding load can be used with joins or concatenating load statements.

In any of the blog posts I have not seen an example posted by anybody using concatenated or multiple join load statements using preceding load.

0 Likes
270 Views

The problem arises when you want to perform a join in a first step and a preceding load in a second step, i.e.

   Load ... ; ( Load ... From x; Join Load ... From y; )

This is not possible. You cannot use brackets this way, and if you remove the brackets the preceding load is only applied on the first load (on x). So instead, the work-around is to do it in two steps:

   tmp:

   Load ... From x; Join Load ... From y;

   RealTable:

   Load ... Resident tmp;

   Drop Table tmp;


For concatenate, there is an additional work-around: To repeat the preceding load:

   Load ... ;

   Load ... From x;

   Concatenate

   Load ...;

   Load ... From y;

  

HIC

0 Likes
270 Views
Contributor III

Thanks Henric..That explains it well

0 Likes
270 Views
New Contributor III

Thank you

0 Likes
270 Views
Contributor II

Maybe I missed this in the string but how does one aggregate/summarize/group by in a preceding load?  I understand the transformation of the data in the preceding load but can't seem to summarize after the transformation...  See script below.  Working from the bottom up the first preceding load where I perform my transformations works but when I add the second preceding load to summarize it fails and says Error: Invalid expression.

ODBC CONNECT TO BAR (UserId is , Password is );
LOSSES:
Load
CAT_LOSS,
CAT_DESC,
count(distinct(CLM_FILE_ID)) as CLAIM_COUNT,
sum(GROSS_REPT_L_ALAE) as GROSS_REPT_L_ALAE,
text(CAL_DATE) as CAL_DATE,
ACCT_D,
LEGACY_COMPANY,
LOB,
DATATYPE,
ACE_PROD_NUMBER,
PRODUCER_NUM_7_DIGIT,
SUB_PRODUCER_NUM,
STATE,
PERIL
;

LOAD
If(trim(len(CAT_ID))=0,'N','Y') as CAT_LOSS,
If(trim(len(CAT_ID))=0,'Non-Cat Loss',CAT_DSC) as CAT_DESC,
CLM_FILE_ID,
INCU_INDM_US_CURY_AMT+INCU_ALAE_US_CURY_AMT as GROSS_REPT_L_ALAE,
date(ACTG_MO&'/'&'01'&'/'&ACTG_YR) as CAL_DATE,
ACTG_YR_MO as ACCT_D,
MCC_DSC,
if(MCC_DSC like '*Fund*','Fund',if(match(MCC_DSC,'Boats','Yachts','Other Recreational Marine','Captives'),'ACE',if(MCC_DSC like '*ACE*','ACE','Chubb'))) as LEGACY_COMPANY,
if(match(MCC_DSC,'CHB Pr Lns Can Homeowners','CHB Pr Lns Homeowners','ACE PRS Firemans Fund Homeownr','ACE Private Rsk Svcs Homeowner'),'Home',
if(match(MCC_DSC,'CHB Pr Lns Can Personal Auto','CHB Pr Lns Pers Auto Phys Dmg','CHB Pr Lns Pers Auto Liability','ACE PRS Firemans Fund Auto', 'ACE Private Risk Services Auto'),'Auto',
if(match(MCC_DSC,'CHB Pr Lns Can Floater','CHB Pr Lns Floater','ACE PRS Firemans Fund Inland','ACE Private Risk Svcs Inland'),'Valuables',
if(match(MCC_DSC,'CHB Pr Lns Can Personal Excess','CHB Pr Lns Pers Excess','ACE PRS Firemans Fund Umbrella','ACE Private Risk Svcs Umbrella'), 'Excess',
if(match(MCC_DSC,'Other Recreational Marine','Boats','Yachts','CHB Pr Lns Can Yachts','CHB Pr Lns Yachts'),'Watercraft','Other'))))) as LOB,
if(MCC_DSC like 'ACE*','ACE-Fund BAR Losses',if(wildmatch(MCC_DSC,'Boats','Yachts','Other Recreational Marine','Captives'),'ACE-Fund BAR Losses','Chubb BAR Losses')) as DATATYPE,
PROD_ID as ACE_PROD_NUMBER,
LEGACY_MAIN_CODE as PRODUCER_NUM_7_DIGIT,
LEGACY_SUB_CODE as SUB_PRODUCER_NUM,
TAX_PLTDV_DSC as STATE,
PERL_DSC as PERIL

;

SQL SELECT
d.MCC_DSC,
a.ACTG_MO,
a.ACTG_YR,
a.ACTG_YR_MO,
b.PROD_ID,
a.INCU_INDM_US_CURY_AMT,
a.INCU_ALAE_US_CURY_AMT,
a.CLM_FILE_ID,
c.LEGACY_MAIN_CODE,
c.LEGACY_SUB_CODE,
a.CAT_ID,
a.CAT_DSC,
b.TAX_PLTDV_DSC,
b.PERL_DSC
FROM ATLAS_NA_DM_PROD..NON_STRTGC_CLM_SUM_AYM a join ATLAS_NA_DM_PROD..NON_STRTGC_CONTR_SUM_AMT b on
a.non_strtgc_contr_sum_amt_sk = b.non_strtgc_contr_sum_amt_sk
and b.cur_ind = 'Y'
inner join ATLAS_NA_DM_LNDG_PROD.ADMIN.PAS_PROD_XREF_LNDG c on
b.PROD_ID = c.PRODUCER_CODE
and c.MARKETING_OFFICE_CODE='PRS'
left join ATLAS_NA_DM_PROD..DIM_MCC d on
a.SRCE_RPTG_MCC_CD = d.MCC_CD
WHERE
(a.ACTG_YR = '2012' OR a.ACTG_YR = '2013' OR a.ACTG_YR = '2014' OR a.ACTG_YR = '2015' OR a.ACTG_YR = '2016' OR a.ACTG_YR = '2017')
AND
a.SRCE_RPTG_MCC_CD in ('0090800', '0090810', '0090820', '0090830', '0090899',
'0091000', '0091010', '0091020', '0091030', '0091090',
'0091100', '0091101', '0091110', '0091130', '0091140',
'0091190', '0121007', '0210010', '0210018', '0210020',
'3200318', '3200319', '3200320', '3200321', '3200323','0210040','0210050') 
AND (a.INS_CD = 'DIR' OR a.INS_CD = 'ASM');
 
store LOSSES INTO C:\QlikView\Scorecard\v7\LOSSEStest.qvd (qvd);

0 Likes
270 Views
Partner
Partner

You have to add a "group by" clause to the preceeding load statement  where you are using aggregate functions. Be sure to include all the column expressions that are not aggrgates in your group by list.

0 Likes
270 Views
Contributor II

So that part of the preceding load is similar to SQL.  Could I accomplish the transformation and grouping by in 1 preceding load?  Is there a benefit to keeping it separate?

0 Likes
270 Views
Partner
Partner

We have observed that preceding loads do tend to extend the load time as they behave like separate load statements running in sequence so clearly there is an argument for combining them into one. I find that if I have a series of calculated fields like what you have in your sample it is usually easier to follow if I do all the transformation operations and then the aggregation step.

The long and short of it is based on personal preference and performance, if thing are not performing well then you may need to try a different approach.

270 Views
Partner
Partner

It is similar to any other load-statement in the script, only there is no source (FROM or RESIDENT) .
You can use WHERE and GROUP BY, probably also ORDER BY.

A problem with preceding loads is they are not as easy to read.
Have also noticed at times that preceding load seems to take more time to execute
then if using a temporary table and doing a RESIDENT load from that.

I would not attempt doing the transformation and grouping in the same LOAD statement.
Also I would separate loading from database and transformation of data.
Ever heard of ETL?
Your code you can't just run the transformation and grouping,
you always have to fetch the data from database also.
Separating those steps you can do the transformation over again without having to fetch new data from the database.

0 Likes
270 Views
Not applicable

great!

0 Likes
270 Views
Contributor III

More than once I see a preceding load which simply regurgitates the field names from the select. Typically its the result of using the wizard.  The only reason I use preceding is to gain access to functionality only available in Qlik (e.g. previous).  It's double-work when you have to change the select and remember to also change the preceding load.  It would seem to faster and easier to rename the fields in the sql rather then in the preceding load.  Is there a best method?

0 Likes
270 Views

I wish there were a simple answer...

I usually do the same as you - I only use preceding load when needed, e.g. to make successive transformation or to use Qlik specific functionality. Renaming fields in the SELECT statement usually works excellently, so I do that quite often.

However, there are some odd ODBC drivers that are case-insensitive, and this causes problems. For example

   SELECT FIELD_A as NewFieldName FROM ...

will then create a field called NEWFIELDNAME - in upper case. This is extremely annoying, and the work-around is of course to rename the fields in a Preceding Load.

But if you then write

   Load FIELD_A as NewFieldName ;

   SELECT * FROM ...

you will get an inefficient SELECT statement: The star could mean hundreds of fields.

This means that you HAVE TO list all fields in both the Load and the SELECT, to get an efficient load sequence:

   Load FIELD_A as NewFieldName, ... ;

   SELECT FIELD_A, ... FROM ...

which seems like double work.

HIC

270 Views
Partner
Partner

A preceding load that only repeats the same field names adds nothing.
It is an extra step that takes some more time to process.
I regularly remove these if I have used the wizard and got them added.
Also when it comes to fields I regularly use the wildcard * instead of naming all the fields.
Granted it can be lots of fields but if it does not have significant impact on the load I think
it better having access to all database fields in the following Transform apps

and there limit to those fields used in the Load step.

270 Views
Contributor III

HIC - I have noticed every now and then the case-sensitive issue and tracked it down to a field being upper-cased without explicitly having this in the script.  I'll review the ODBC drivers with the admin. 

I have done the same Anders - remove the preceding load and add renames in the SQL.  Much cleaner and easier to maintain.  With regard to renames, we've found users are more comfortable with a field worded in business terms (instead of the cryptic but ever popular custno).  We typically rename just before creating our dimensional model QVDs.

Thanks for the tips - very helpful.

0 Likes
270 Views
Contributor II

Can I just ask for a little clarification here? Reading most of the above I understood that preceding loads work on single records at a time. So, a data source of 1000 records would load the first record, pass this on to the first preceding load for any transformations before loading the second of the 1000 records, etc. This is in contrast to a resident load which loads all 1000 records into RAM before the actual "LOAD Resident" part can do any transformations.

Resident method:

Step 1: Load 1000 records

Step 2: Load 1000 records again from step 1 (now resident in memory) adding new fields.

Preceding method:

Step 1: Load record 1 of 1000

Step 2: First preceding load part adds new field to this first record

Step 3: A second preceding load uses field created at step 2 to create another new field

Step 4: Store record 1, along with new fields, in memory

Step 5: Load record 2 of 1000

Step 6: First preceding load part adds new field to this second record

Step 7: A second preceding load uses field created at step 6 to create another new field

Step 8: Store record 2, along with new fields, in memory

Step 9: Load record 3 of 1000

etc

After final step: It is only now that all 1000 records are resident in RAM.

If I'm right, it would not be possible to perform any aggregations via the preceding load method as not all records are available during the load, yet the recent posts above seem to suggest you can. Have I got it wrong?

0 Likes
270 Views

Good point.

In principle, the Preceding load is a record-by-record evaluation where each record is piped into the next Load and completely evaluated before the next record is read. In principle.

But there are exceptions... A group by clause will break this rule, and force the entire table to be read before anything can be piped to the next Load.

Also, parallelization will cause several records to be evaluated simultaneously by different threads. (If the output of the Load doesn't depend on other records, through e.g. the Peek() function.)

HIC

270 Views
Contributor II

I see, thank you.

Also, I'm glad you pointed out about the inter-record functions, I hadn't thought about those.

So, is parallelization a thing that only occurs during an optimized load or does it happen anyway - are there three load speeds - optimized, parallelized (?) and normal?

0 Likes
270 Views
Contributor

Thanks Lawrance and Henric for making this more clear. I am using Preceding load a lot an this has angreat value.

Tormod Hanstad

Sendt fra min iPhone

7. sep. 2017 kl. 14:24 skrev Lawrence Cooper <qcwebmaster@qlikview.com<mailto:qcwebmaster@qlikview.com>>:

0 Likes
270 Views

There are even more alternatives...

Optimized - A qvd load where the records aren't unpacked

Full parallelization - A load where different records and different fields within the same record may be treated by different threads

"Partial" parallelization - A load where different fields within the same record may be treated by different threads

Single threaded - The load is performed in one thread only


Which one the engine uses depends on the circumstances: Are the fields transformed? (Then "optimized" is excluded). Does the load contain Peek() or Exists()? Then the second alternative is excluded.


The last alternative can only be achieved by using

EnableParallelReload=0

in Settings.ini


HIC

270 Views
Partner
Partner

hic‌, What about GROUP BY clause? Can it be calc multi-threaded or only single-threaded?

TH

0 Likes
270 Views

I believe that most of it is multi-threaded. In most cases (Sum(), Avg(), Max(), etc.) there should be no problem distributing different records on different threads since there is no dependency in the order of the threads. There could however be limitations when you calculate aggregates that depend on the sort order of the records, e.g. Median().

But to be 100% sure I need to ask the developers.... 🙂

HIC

270 Views
Partner
Partner

Henric Cronström napisał(-a):

I believe that most of it is multi-threaded. In most cases (Sum(), Avg(), Max(), etc.) there should be no problem distributing different records on different threads since there is no dependency in the order of the threads. There could however be limitations when you calculate aggregates that depend on the sort order of the records, e.g. Median().

But to be 100% sure I need to ask the developers.... 🙂

HIC

hic‌, I checked it: GROUP BY clause with Sum(). It's single-threaded operation in QV 12SR6 and QV 11.20SR16.

Sample QVW:

https://ufile.io/amnft

Can You ask the developers?

TH

0 Likes
270 Views
Labels