Mapping vs loading multiple associated tables - Qlik Community

mmarchese · ‎2020-03-27

Is there a consensus regarding which situations warrant using mapping instead of loading multiple tables that are associated within the data model?

Say my app has a fact table with a PersonID column, and then I have a Person lookup table with a PersonID column (no duplicates) and several other columns. Here are two cases:

I assume I should just load both tables into the data model if I intend to reference several columns from the Person table in the app (as opposed to using ApplyMap several times to create several new columns in the fact table). Is that correct?
Now, what if I only intend to use one column from Person, such as FullName? In that case, is mapping preferred over loading both tables into the data model? Why? Is mapping more performant? (please assume the fact table is large and each person gets referenced by it many times). Is it mainly personal preference? One argument I see against using mapping in this situation is that if you decided later to bring in more columns from Person, it'd be easier to make that change if you had already been loading both tables into the model instead of using mapping.

Are there other situations where mapping is preferred over loading multiple tables? Thoughts on best practices?

Edit:

I found these interesting articles, which don't address my questions directly (association vs mapping) but are sort of related:

Don't join use Applymap instead (states that mapping is typically preferred over joining)
To Join or not to Join (states that association is typically preferred over joining)

Edit2:

After reading Marcus_sommer's comment advocating for star schemas, I went searching and found this article stating that snowflakes are usually also fine as long as intermediate dimension tables aren't very large, in which case they should be joined with the fact table:

A Myth about the Number of Hops

marcus_sommer · ‎2020-03-29

As the best compromise between script- and UI performance as well as the efforts to create and maintain an application is the star-scheme datamodel. This means within the datamodel is one fact-table which has various dimension-tables connected. In regard to your description is it therefore not needed to add the various personal-fields to the fact-table (regardless with which method it is done whereby a mapping-approach is usually the most suitable one).

Just associating fact- and dimension-tables worked usually very well. Without concrete challenges in regard to the performance or any calculations/views there is no need to change this approach. An exception to this suggestion might be if the dimension-table not only contained “classical” dimension-data like your mentioned name to an ID else also measures like the default working-hours (not each one will be a FTE) which are needed for any calculations.

- Marcus

mmarchese · ‎2020-03-30

Marcus_sommer:

I understand the first two-thirds of your answer: it's unnecessary to move data from dimension tables into the fact table for the purpose of collapsing a star schema down to a single table.

Could you elaborate on the second part, which starts with "An exception to this suggestion might be"?

marcus_sommer · ‎2020-03-31

Quite often has a dimensional-table really only "classical" dimensional data like a name to an ID and a relation of n:1 from the fact- to the dimension-table. But it's not mandatory the only way how a dimensional might be constructed - there might be also the measures or flag-fields included, the relationship might be n:n, the dimension-table might be in some kind a fact-table, too … surely there are various scenarios thinkable in which it may suitable to extend the classical and recommended logics for a datamodel.

- Marcus