Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello everybody!
I have a conceptual question considering the WHERE-clause in load script. How does Qlik process this statement?
Simple example:
Load *
Resident Table1
Where 1 = 1
and Year(Date) = 2016
and Activity = 1
;
How does Qlik process this statement?
I see 2 possible approaches.
1. a) take Table1
b) take all the rows in Table1 with year = 2016 (let us call it subset1)
c) take all the rows in subset1 with Activity = 1
2. a) take Table1
b) check if the row has year = 2016
c) if step b is true, check if the row has activity = 1
d) process next row
I think right understanding of this process can be very useful in performance increasing.
Have a nice day!
Elena
It should take all rows from Table1 where Year(Date) = 2016 and Activity = 1
I think 2 is the right one
Thank you for your opinion, Manish!
Have you somehow checked it?
Let me know how to check it...
Where you use Debug, you will see the number of rows fetched.
You will never see that first 1000 lines fetched and now using 2nd where clause 500 lines fetched..
it will give you one only (500 lines fetched) sentences.
So I think the where clause is not creating any subset ..
You can also try this
Simple example:
Load *
Resident Table1
Where match (Year(Date),' 2016') and match (Activity ,'1');
I think it is not so correct to say that fetching means subsequent processing of rows. There are some inner mechanisms, that are not presented to you. Like if you have Group- and Where-clauses within one table transformation, you will only have the final number of rows.
Susovan,
I think you didn't catch my question. It is more about the performance.
If Qlik checks first condition and make a subset and then checks next condition, then it is better to organize your conditions with necessary resources increasing.
Like first to check Activity (it is usually 1 or 0 | 1 or null()), and then apply functions to get year.
I don't know how it will be processed and I could imagine a slight different processing outgoing from the second approach from above that both field-values (and in general each condition) will be checked on row-level and only after reading all records will be evaluated which record is valid or not - quite similar to the qlik-logic to calculate a nested -if-loop which also calculate at first each branch within them and afterwards it will be evaluated which condition is true.
This would meant that the processing of a where-clause is a two-step process and worked from this point of view similar to a group/order by statement from I believe that they are also executed as a two-step process - unless they are combined then it will be a multi-step process (from a performance point of view it's often better not to combine these statements else to use separate steps within a load-chain). I hope hic could give some insights to this topic.
- Marcus
In my opinion, if Qlik uses first algorithm in some cases we can win a lot.
Assume you have 1 mln rows with activity = 1 and only 5 have year = 2016. I feel make a subset of 5 rows with year = 2016 and check which of them have activity = 1 is easier then check each row on both conditions.