left join VS keeping table seperate (normalized vs... - Qlik Community

Report Inappropriate Content · ‎2010-11-11

Is there any way we can justify which one of this approach is better? and why? For example, I've a dimension table with 24 columns. one of the id links to fact table. and there are 6 other id's which links to 6 sub dimensions. I can easily do a left join on the main dimension with all 6 sub dimensions and now my main dimension will widen from 24 columsn to 53 columns (Same number of rows obviously).

Now, as far as I know QV stores unique values in memory and it uses binary association for the relation. So, in that case having tables joined or having them seperate wouldn't affect the performance. Or would it? So, why having it normalized vs denormalized?

Report Inappropriate Content · ‎2010-11-11

My understanding is that it's more about memory allocation than performance. I think it's required much less memory to store 6 sub-dimension references for each main dimension record than 29 references to unique dimension values. And it's much worse if you have thousands of records in your dimensions.

Report Inappropriate Content · ‎2010-11-11

When you say it will require less memory for dimension references are you refering to Binary Keys that keeps track of association? I've around 10,000 records in my main dimension and couple of 10's to 20's in all other sub dimensions. Besides memory utilization, is there a performance degradation or improvement with norm Vs Denorm in the above scenario?

johnw · ‎2010-11-11

It's hard to say which approach is better for performance. QlikView will certainly compress it, and I suspect the additional memory required for the left join approach will be small. I haven't done any testing, but the anecdotal evidence seems to suggest that the denormalized tables will typically be a little faster to process once they're loaded in (actually testing it on your real data is far superior to following a rule of thumb based on anecdotes, of course). Doing the left join, of course, will take additional time in the script, which must be weighed against any potential benefits.

This isn't very scientific, and frankly ignores the performance question, but this is roughly the decision tree I follow when deciding whether to left join something onto the "main" table:

if(the table is really its own kind of object
,if(I only need a single field from the table
,left join it to the main table
,leave it separate
,if(I only need a handful of fields from the table
,left join it to the main table
,leave it separate))

So following my decision tree, even knowing nothing about your data, I would leave it separate since you're picking up 24 fields. But it's only roughly what I do. I've certainly joined dozens of fields from helper tables onto the main table where I felt it made sense to do so, where I felt the two objects were similar enough in the context of the application that there was no conceptual reason to keep them separate.

left join VS keeping table seperate (normalized vs denormalized)