Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I have One transaction table (40 fields) and one dimension table (10 fields), and I want to join them into one table also, I have 20 expressions which I want to bring it in script level.
Which is the best practice for better performance, ?
T:
Load a1, b1, c1, sum(p1) as p1, sum(q1) as q1
from TTable group by a1, b1, c1 ;
Left join (T)
Load a1, D1, D2, D3 From DTable;
OR
T_tem:
Load a1, b1, c1, p1, q1 from TTable;
Left Join(T)
Load a1, D1, D2, D3 From DTable;
T:
Load a1, b1, c1, D1, D2, D3, Sum(p1) as p1, Sum(q1) as q1 Resident T_tem
group by a1, b1, c1, D1, D2, D3 ;
Regards
Renji
Hi Renji,
Its depends on your requirement & data.
For ex: You're joining with field "a1" between tables. If the field "a1" is the lowest granularity, then the "Option 1" is better.
Suppose, the field "a1" is not lowest granularity, b1 or C1, then Option 1 is not recommended.
For ex: if you're using the Dimension as "a1, b1" & expression as "p1", then "b1" has same values for all rows, because, summation in script level is calculated based on "a1" which is not lowest granularity. In that case, you can go with Option 2. But again, duplication Occurs.
So, if a1 is not lower level, please keep the table separately without any joins.
Hope it helps.
Regards,
Mayil Vahanan R
Hi Renji,
Its depends on your requirement & data.
For ex: You're joining with field "a1" between tables. If the field "a1" is the lowest granularity, then the "Option 1" is better.
Suppose, the field "a1" is not lowest granularity, b1 or C1, then Option 1 is not recommended.
For ex: if you're using the Dimension as "a1, b1" & expression as "p1", then "b1" has same values for all rows, because, summation in script level is calculated based on "a1" which is not lowest granularity. In that case, you can go with Option 2. But again, duplication Occurs.
So, if a1 is not lower level, please keep the table separately without any joins.
Hope it helps.
Regards,
Mayil Vahanan R
Hi Mayil Vahanan,
Field a1 is not at the lowest granular level, but i am also worried about duplication.
In regard of worries about the relationship between both tables you should consider to use a mapping instead of joins. Beside this mappings are usually much more performant as joins and you could use their results directly without the need of a following load. That you have multiple fields isn't really a problem because you could string-concat them and cut them again with subfield().
Further I suggest to check if you really need all these aggregations on the script-level and if you need to apply them against all fields - because such aggregations could be very expensive in regard to the performance and you may need an incremental approach for it.
- Marcus
Just going to add a couple of links from Help Doc regarding best practices for data loading and app development:
If any of the above posts was helpful in assisting you in resolving the issue, we would greatly appreciate it if you would return to the thread and close things out by using the Accept as Solution button on those posts as this gives the posters credit for the assistance and it lets other members know what actually helped. If you did something different, please consider posting what you did and then mark that post afterward to close the thread.
Regards,
Brett