Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Nested If as a dimension

I have the below Nested IF statement as a DIMENSION expression in my straight table but now my QVW is VERY SLOW when on open. I am using this dimension expression to hide rows or values when the value is null because the rules don't apply.   It seems I can not use the below IF statement in the script because all the fields in the conditions belong to different tables unless I combine all the tables into one table.  It doesn't make sense to do this.  I can't use set analysis because it is not a calculated expression, right? I also tried the PICK/MATCH functions but I can't seem to get the syntax working. Any help/suggestions are greatly appreciated. 

=if([VARIANCE ID] = 14 AND [Management Line]='Structured Finance' AND [Business Group 1] <> 'Covered Bonds', Null(),

  if([VARIANCE ID] = 15 AND [Management Line]<>'Structured Finance' AND [Business Group 1] <> 'Covered Bonds',  Null(), 1))

1 Solution

Accepted Solutions
Not applicable
Author

Hello, several dims for sure will produce more than 1 row. However, the one you defined may have only two possible values: NULL and 1. ValueList(1) would produce a synthetic dim and might be necessary in case your dimension is the only one.

since you do have some other dims - ValueList(1) is not needed. Try to move this logic into your expression. As a first step - remove the dimension and put your conditions into expression. Instead of count(dimension) you may use:

=sum(

if([VARIANCE ID] = 14 AND [Management Line]='Structured Finance' AND [Business Group 1] <> 'Covered Bonds', 0,

  if([VARIANCE ID] = 15 AND [Management Line]<>'Structured Finance' AND [Business Group 1] <> 'Covered Bonds',  0, 1))

)

(+ as Marcus said this may be simplified). If this works (you see at least some performance improvement) - next step is replacing all the IFs with set analysis.

As a last resort - redesign your data structure to keep all the columns (that you use in this straight table) in 1 in-memory table (use table viewer to review how it looks).

The steps I provide all based on following simple observation:

1. Simple, field-based dims usually faster than calculated dims (in most cases)

2. Set analysis usually faster than any conditional operators (if, pick/match etc)

3. If any object definition is based on fields from ONE internal in-memory table then it works faster (calc time is lower) comparing to when it is based on TWO or more in-memory tables. This may require redesign data structure, change LOAD scripts (add joins, etc) and increase load/re-load a bit BUT it may save much calc. time.

View solution in original post

13 Replies
marcus_sommer

Your two conditions are quite similar. Therefore they could be reduced to one condition:

if(match([VARIANCE ID], 14, 15) AND [Management Line]<>'Structured Finance' AND [Business Group 1] <> 'Covered Bonds',  Null(), 1)

Further you could speed up such calculations if the conditions didn't compare strings else numeric values, maybe like this:

if(match([VARIANCE ID], 14, 15) AND [Management Line ID]<> 50 AND [Business Group 1 ID] <> 5,  Null(), 1)

- Marcus

Not applicable
Author

The straight table consts of 1 row, right? If I got the question correctly:

1. Use =ValueList(1) as a calculated dimension and move all the IF stuff into expression

2. Use Set Analysis instead of IFs

3. Why it doesn't make sense to create a single table? This is the best option from performance point of view. It worth trying if points 1,2 do not help to reach good performance

Not applicable
Author

Hi Andrei,

Thank you for your input.

I have a single straight table that consists of many rows with several dimensions and 1 expression with a count(dimension).   One of my dimensions I use the expression above as a calculated dimension to hide the rows when it is null.  I guess I don't know how I would use set analysis in either my dimension expression  or calculated expression in this scenario.  Any guidance is greatly appreciated.

Lastly, I am not understanding how I would use =ValueList(1) in my example.  Can you explain in more detail?

Melinda

Not applicable
Author

Hello, several dims for sure will produce more than 1 row. However, the one you defined may have only two possible values: NULL and 1. ValueList(1) would produce a synthetic dim and might be necessary in case your dimension is the only one.

since you do have some other dims - ValueList(1) is not needed. Try to move this logic into your expression. As a first step - remove the dimension and put your conditions into expression. Instead of count(dimension) you may use:

=sum(

if([VARIANCE ID] = 14 AND [Management Line]='Structured Finance' AND [Business Group 1] <> 'Covered Bonds', 0,

  if([VARIANCE ID] = 15 AND [Management Line]<>'Structured Finance' AND [Business Group 1] <> 'Covered Bonds',  0, 1))

)

(+ as Marcus said this may be simplified). If this works (you see at least some performance improvement) - next step is replacing all the IFs with set analysis.

As a last resort - redesign your data structure to keep all the columns (that you use in this straight table) in 1 in-memory table (use table viewer to review how it looks).

The steps I provide all based on following simple observation:

1. Simple, field-based dims usually faster than calculated dims (in most cases)

2. Set analysis usually faster than any conditional operators (if, pick/match etc)

3. If any object definition is based on fields from ONE internal in-memory table then it works faster (calc time is lower) comparing to when it is based on TWO or more in-memory tables. This may require redesign data structure, change LOAD scripts (add joins, etc) and increase load/re-load a bit BUT it may save much calc. time.

Not applicable
Author

Thanks for the responses!  I've tried almost everything however, I am still having performance issues when opening the QVW.  I can't use set analysis since this logic is on the dimensions and there are no expressions or metric calculations in the app.  I am to the point of making ONE internal in-memory table but I am having some syntax issues trying to join all my tables.  Can you confirm that the syntax below in my JOINS are correct?  Basically I am trying to join 2 different tables to another.  Please note, one of my tables is over 43 million rows so I am hoping I don't crash the QV dev server when doing this 1 in memory table. 

Tables:

COMMITTEE – has 43 million rows
RATINGFACT – has 1.3 million rows
SERVICEDIM – has 81,000 rows

Here are what I think my JOINS should be but I am not 100%

COMMITTEE:
LOAD

           RATING_COMMITTEEDIM_KEY,

           etc…

          

ISSUERATINGFACT:
OUTER JOIN (COMMITTEE)
LOAD DISTINCT

           RATING_COMMITTEEDIM_KEY,

           SERVICEDIM_KEY

           etc…

SERVICEDIM:

INNER join (ISSUERATINGFACT)

           LOAD DISTINCT

           SERVICEDIM_KEY,

           etc…

NickHoff
Specialist
Specialist

To do a PICK you use the following.

RenameMe:

LOAD * INLINE [

    %VarID, %VarDesc

  1, VarianceID14Desc

  2, VarianceID15Desc

];

Expression for PICK:

PICK(%VarID,

Expression for Variance14,

Expression for Variance 15))

NickHoff
Specialist
Specialist

In response to your joins, the logic is an issue.  You should be joining to your fact table, not joining your fact to a dimension.  Also, you might not need to have distinct on your load unless those tables are dirty data not normalized.  Last of all, we run a lot more than 43 million rows "in memory" close to 500 million and we don't crash the server but we have a lot invested in our hardware 1TB of RAM each server and model our applications correctly where the final stage is in optimized qvd load format. 

ISSUERATINGFACT:
LOAD

           RATING_COMMITTEEDIM_KEY,

           SERVICEDIM_KEY

           etc…

LEFT JOIN(ISSUERATINGFACT)

LOAD

           RATING_COMMITTEEDIM_KEY,

           etc…

INNER join (ISSUERATINGFACT)

LOAD

           SERVICEDIM_KEY,

           etc…

marcus_sommer

To join all tables together is rather not a recommended approach. Often there are performance advantages in gui-calculations when fact-tables are matched but not by the dimension-tables.

Further important is the kind and amount of matching. In genereal you could use joining, concatenate and/or mapping whereat joining is a lot more difficult if you don't know the data-quality very good. Often fact-tables aren't really linked to eachother and could be simply concatenated.

Whereas mapping with applymap is very useful if cases of bad data-quality and/or if you only need to transfer a few fields or informations from one table to the other. This meant some fields will be transfered to another table and used for some calculations maybe to create flags but the datamodel itself don't need necessarily be changed.

To Join or not to Join

Don't join - use Applymap instead

Concatenate vs Link Table

Then you should check if you need really all fields and if you could optimize some fields and/or keys and that you have a proper datamodel.

The Importance Of Being Distinct

Symbol Tables and Bit-Stuffed Pointers

Synthetic Keys

Circular References.

Also you could use set analysis in a expression to filter out dimension-values then if the expression returned NULL dimension-values could be hidden (options by dimensions and within the tab presentations).

Beside these things you should check your server ressources (I think at least 16GB RAM for these app) and settings (cores disabled, working set - settings, ...) then if the system swap RAM into the pagefile it will be very slow.

- Marcus

Not applicable
Author

Hey,

Probably i am over simplifying here but is there a reason why you do not want to take your expression to the script and build a field out of it?

Regards

RL