1 Reply Latest reply: Aug 30, 2012 5:37 AM by Jonathan Brough RSS

    best practice: concatenate or link table?

    Marc Kaiser

      Hi

       

      How would you design you qv data model based on the following conditions?

      I went for concatenate, but I'm not sure about possible performance issues.

      Maybe a link table would be a better approach? Or any other approach?

       

      2 fact tables to be combined:

      - fact1 = 15 mio records, 40 fields

      - fact2 = 10'000 records, 150 fields (all needed...)

       

      fact1 and fact2 share 4 fields (date, company, salesrep, product).

       

      4 dimension tables will link to the fact table (calendar, company-, salesrep- and product- master data). There are a couple of other dimension tables linked to the company (hierachy etc.), salesrep (hierarchy) and product (again hierarchies...).

       

      I went for concatenate but my gut feeling says it may not be the best approach.

       

      I also did some basic performance tests: I created some fairly simple bar charts for sales per month etc. No if conditions etc in expressions. However, set analysis is used. Some charts are based on fields from fact1, others from fact2 (never combined). All charts share fields from master data.

       

      When I select fields or even just jump from one tab to the other, CUP is running like Usain Bolt at the Olympics... (8 CPU's busy at 100%).

       

      First priority is performance.

       

      Any best practice comments or recommendations?

       

      Thanks, Marc

       

      PS: It is not possible to split the two fact tables into two different apps. They have to be in one app.

        • Re: best practice: concatenate or link table?
          Jonathan Brough

          Hi Marc,

          My understanding is that QV's ability to run in RAM is that the data model is stored in an ultra-normalised state, such that each field is only ever stored once, so that each field effectively acts as a key in it's own table (though this is hidden from the user in the table view).

          With this in mind, I tend to treat the structure of the data model as a means to represent and redefine levels of granularity, than for performance. I typcially start with one Fact and some dimensions linked off it, then concatenate onto this Fact if I find common dimensions from different transaction/Fact tables. Once these different tables don't have many common dimensions, or once I find that new transaction tables have a different level of granularity I tend to rejig the structure to put a link table in.

          In a sales application for example, quotes, invoices and orders, that are all cast at the row level of granuality, would get concatentated into one Fact. If however Accounts receiveables are added on, cast at the level of Invoice header, I would pull the row level transactions out into the three original tables and then use a link table to link through to the relevant invoice.

          Hopefully this makes sense. I'd love to hear if there are performance concerns, but I've not found them in my biggest model yet (a stock application with 4 million rows). One thing to watch out for on the performance tests  would be not switching from one side of a data model to the other. Sounds like you're minimising this by not inluding data from different areas of the model in the same chart, but I suspect even asking QV to recaluclate different charts that take in different sets of the model might cause it to stutter a bit.

          Regards,

          Jonathan