Skip to main content
Announcements
See why Qlik was named a Leader in the 2024 Gartner® Magic Quadrant™ for Data Integration Tools for the ninth year in a row: Get the report
cancel
Showing results for 
Search instead for 
Did you mean: 
renjithpl
Specialist
Specialist

First Group by and Then Join or vice versa - Performance

Hi All, 

I have One transaction table (40 fields) and one dimension table (10 fields), and I want to join them into one table also, I have 20 expressions which I want to bring it in script level. 

Which is the best practice for better performance, ?

T:
Load a1, b1, c1, sum(p1) as p1, sum(q1) as q1 
from  TTable group by a1, b1, c1 ; 
Left join (T)
Load a1, D1, D2, D3 From DTable; 

                                             OR

T_tem:                                 
Load a1, b1, c1, p1, q1 from TTable;
Left Join(T)
Load a1, D1, D2, D3 From DTable; 

T:
Load a1, b1, c1, D1, D2, D3, Sum(p1) as p1, Sum(q1) as q1 Resident T_tem
group by  a1, b1, c1, D1, D2, D3 ;

 

Regards
Renji

1 Solution

Accepted Solutions
MayilVahanan

Hi Renji,

Its depends on your requirement & data. 

For ex: You're joining with field "a1" between tables. If the field "a1" is the lowest granularity, then the "Option 1" is better. 

Suppose, the field "a1" is not lowest granularity, b1 or C1, then Option 1 is not recommended.

For ex: if you're using the Dimension as "a1, b1" & expression as "p1", then "b1" has same values for all rows, because, summation in script level is calculated based on "a1" which is not lowest granularity. In that case, you can go with Option 2. But again, duplication Occurs. 

So, if a1 is not lower level, please keep the table separately without any joins.

Hope it helps.

Regards,

Mayil Vahanan R

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.

View solution in original post

4 Replies
MayilVahanan

Hi Renji,

Its depends on your requirement & data. 

For ex: You're joining with field "a1" between tables. If the field "a1" is the lowest granularity, then the "Option 1" is better. 

Suppose, the field "a1" is not lowest granularity, b1 or C1, then Option 1 is not recommended.

For ex: if you're using the Dimension as "a1, b1" & expression as "p1", then "b1" has same values for all rows, because, summation in script level is calculated based on "a1" which is not lowest granularity. In that case, you can go with Option 2. But again, duplication Occurs. 

So, if a1 is not lower level, please keep the table separately without any joins.

Hope it helps.

Regards,

Mayil Vahanan R

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
renjithpl
Specialist
Specialist
Author

Hi Mayil Vahanan,

Field a1 is not at the lowest granular level, but i am also worried about duplication. 

 

marcus_sommer

In regard of worries about the relationship between both tables you should consider to use a mapping instead of joins. Beside this mappings are usually much more performant as joins and you could use their results directly without the need of a following load. That you have multiple fields isn't really a problem because you could string-concat them and cut them again with subfield().

Further I suggest to check if you really need all these aggregations on the script-level and if you need to apply them against all fields - because such aggregations could be very expensive in regard to the performance and you may need an incremental approach for it.

- Marcus

Brett_Bleess
Former Employee
Former Employee

Just going to add a couple of links from Help Doc regarding best practices for data loading and app development:

https://help.qlik.com/en-US/qlikview/April2020/Subsystems/Client/Content/QV_QlikView/LoadData/best-p...

https://help.qlik.com/en-US/qlikview/April2020/Subsystems/Client/Content/QV_QlikView/application-per...

If any of the above posts was helpful in assisting you in resolving the issue, we would greatly appreciate it if you would return to the thread and close things out by using the Accept as Solution button on those posts as this gives the posters credit for the assistance and it lets other members know what actually helped.  If you did something different, please consider posting what you did and then mark that post afterward to close the thread.

Regards,
Brett

To help users find verified answers, please do not forget to use the "Accept as Solution" button on any post(s) that helped you resolve your problem or question.
I now work a compressed schedule, Tuesday, Wednesday and Thursday, so those will be the days I will reply to any follow-up posts.