Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
harsha
Creator
Creator

How to join two tables without impact on load time and performance

Hi Experts,

With the below scripts, I have 4 tables RAWDATA, Rule_1, Rule_2 & Rule_3 . But I want only RAWDATA with Flags created in different tables in it. 

I have below script in place where RAWDATA table has all the dimensions along with Rule_4_Flag and Rule_5_Flag created from other tables(Rule_34 & Rule_5)...but I want to include Rule_1_Flag, Rule_2_Flag & Rule_3_Flag  into RAW Data as dimensions. 

 

RAWDATA:

LOAD  *

FROM [lib:// Checks/ DATA_2019.QVD] (qvd);

///////////////////////////////////////////////Rule-1_/////////////////////////////////////////////

NoConcatenate

Rule_1:

LOAD *,

if((VALUE)>100,'Yes','No') as Rule-1_Flag

Resident RAWDATA

where match (DataID,'36950','42915');

///////////////////////////////////////////////Rule-2_////////////////////////////////////////////

NoConcatenate

Rule_2:

LOAD *,

if((VALUE)<0,'Yes','No') as Rule-2_Flag

Resident RAWDATA

where match (DataID,'36950');

///////////////////////////////////////////////Rule-3/////////////////////////////////////////////

NoConcatenate

Rule_3:

Load *,

//if(Frac(VALUE)>0,'WithDecimal','WithoutDecimal') as Decimal_Flag

if(SubStringCount(VALUE,'.')>=1,'WithDecimal','WithoutDecimal') as Decimal_Flag

Resident RAWDATA

where match (DataID,'60685');

///////////////////////////////////////////////Rule-4//////////////////////////////////////////////////////////

NoConcatenate

Rule_4:

LOAD CompanyID, [Time Series Year], VALUE As Value1

Resident RAWDATA

Where DataID=60634843285;

Left Join(Rule_4)

LOAD CompanyID, [Time Series Year], VALUE As Value2

Resident RAWDATA

Where DataID=60634843286;

Left Join(RAWDATA)

LOAD CompanyID, [Time Series Year], If(Value1<Value2, 'Yes','No') As Rule_4_Flag

Resident Rule_4;

Drop Table Rule_4;

///////////////////////////////////////////////Rule-5//////////////////////////////////////////////////////////

NoConcatenate

Rule_5:

LOAD CompanyID, [Time Series Year], VALUE As Value1

Resident RAWDATA

Where DataID=60634843285;

Left Join(Rule_5)

LOAD CompanyID, [Time Series Year], VALUE As Value2

Resident RAWDATA

Where DataID=60634843321;

Left Join(RAWDATA)

LOAD CompanyID, [Time Series Year], If(Value1<Value2, 'Yes','No') As Rule_5_Flag

Resident Rule_5;

Drop Table Rule_5;

Labels (3)
1 Solution

Accepted Solutions
stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi,

The only way you will make it so that data load is performant is to create the flags when you are creating the QVD, rather than after loading from it. The first load here is going to be an Optimised QVD Load, which will be super quick. All subsequent loads, from the RESIDENT tables will not be optimised.

Take a read of my blog post on optimised loads, as this goes into a fair bit of detail of what you can an cannot do with an optimised load:

https://www.quickintelligence.co.uk/qlikview-optimised-qvd-loads/

 

You are also going to have serious issues with how the code is written at present with synthetic keys. Even with the first two loads, if the QVD has columns A, B and C you are loading from these, and then creating a second table with columns A, B, C and Rule-1_Flag. Qlik will need to create a key which combines A, B and C in a synthetic key, this will be disastrous for performance, and if you have a high number of rows or columns it will crash things.

It will not be optimised, but you will get a much better result doing the following:

LOAD
*,
if(DataID = 36950 or DataID = 42915, if(VALUE)>100,'Yes','No'), null()) as Rule-1_Flag,
if(DataID = 36950, if((VALUE)<0,'Yes','No'), null()) as Rule-2_Flag,
if(DataID = 60685, if(SubStringCount(VALUE,'.')>=1,'WithDecimal','WithoutDecimal'), null()) as Decimal_Flag
FROM [lib:// Checks/ DATA_2019.QVD] (qvd);

 

I'm not sure what is going on with the joins in the subsequent checks, but you need to ensure that you don't create any synthetic keys in when you are creating those rules either. If it the checks can be done when you first load from the QVD (assuming you can't do it when you create the QVD) then that will be better.

Hope that helps.

View solution in original post

6 Replies
harsha
Creator
Creator
Author

Hi @stevedark , @Saravanan_Desingh  & @lorenzoconforti .

 

Could you help me in this pls?

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi,

The only way you will make it so that data load is performant is to create the flags when you are creating the QVD, rather than after loading from it. The first load here is going to be an Optimised QVD Load, which will be super quick. All subsequent loads, from the RESIDENT tables will not be optimised.

Take a read of my blog post on optimised loads, as this goes into a fair bit of detail of what you can an cannot do with an optimised load:

https://www.quickintelligence.co.uk/qlikview-optimised-qvd-loads/

 

You are also going to have serious issues with how the code is written at present with synthetic keys. Even with the first two loads, if the QVD has columns A, B and C you are loading from these, and then creating a second table with columns A, B, C and Rule-1_Flag. Qlik will need to create a key which combines A, B and C in a synthetic key, this will be disastrous for performance, and if you have a high number of rows or columns it will crash things.

It will not be optimised, but you will get a much better result doing the following:

LOAD
*,
if(DataID = 36950 or DataID = 42915, if(VALUE)>100,'Yes','No'), null()) as Rule-1_Flag,
if(DataID = 36950, if((VALUE)<0,'Yes','No'), null()) as Rule-2_Flag,
if(DataID = 60685, if(SubStringCount(VALUE,'.')>=1,'WithDecimal','WithoutDecimal'), null()) as Decimal_Flag
FROM [lib:// Checks/ DATA_2019.QVD] (qvd);

 

I'm not sure what is going on with the joins in the subsequent checks, but you need to ensure that you don't create any synthetic keys in when you are creating those rules either. If it the checks can be done when you first load from the QVD (assuming you can't do it when you create the QVD) then that will be better.

Hope that helps.

harsha
Creator
Creator
Author

Thanks for the detailed reply Steve... they are awesome tips ... however, how would I add multiple DataIDs ? Like flag 1
stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Unless I am misunderstanding something, the code above is dealing with two DataIDs in this line:

if(DataID = 36950 or DataID = 42915, 

 

I've removed the single quotes, as all your IDs seem to be numbers and comparing numbers is inherently quicker than comparing strings.

 

If I have missed something please explain.

Thanks.

harsha
Creator
Creator
Author

that's right... I missed this point to include.. my bad.

for flag 2, i want to apply for certain DataIDs..

(DataID,'36950','42999','60471','60477','60475','60475','60473');

How can I handle if the more number of DataIDs to include ?

 

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi,

You can still use the match statement as you had in your original code. I just used  = as there were only two options in the example you gave. For three or more options match makes sense.

I would try it with and without quote marks, it may perform better without.

Steve