3 Replies Latest reply: Apr 29, 2010 9:13 PM by John Witherspoon RSS

    datamodel for datawarehouse

    Thomas Rieck

      I'm working on a datamodel for a Datawarehouse that should meet the following major requirements :

       

      As generic as possible (various datasources, even ones we are currently not thinking of J)

      Easy to extend

      Easy to support

      High Performance (GUI and aggregation on facts) especially when dealing with very high amount of data (this is one of the most crucial points, please keep this always in mind when giving feedback)

       

      Just a few words on the concept …

       

      Objects will be defined by linking "objectclasses" to the objects. This is a bit like the associative data model approach :

       

      Identifier Name

      77 Flight BA1234

      08 Heathrow Airport

      32 12-Aug-1998

      48 10:25am

      12 arrived at

      67 on

      09 at

       

      Links

      Identifier Source Verb Target

      74 77 12 08

      03 74 67 32

      64 03 09 48

       

      As you see in the diagram, I'm trying a compromise between complexity/flexibility/performance. Hence I have choosen a not completely normalized snowflake model with splitted fact tables.

       

      If you're giving feedback, please consider as well the "traditional" QlikView approach (Dimensions/LinkTable/Facts/Starschema) and If you see advantage of one over the other please let me know.

       

       

      Thanks a ton ...

      Thomas

      error loading image

        • datamodel for datawarehouse
          Thomas Rieck

          nobody any oppinion ? is it complete nonsense ?

            • datamodel for datawarehouse
              Fernando Toledo

              How often the document will be reloaded?

              If you join everything in a big table the application performance will be great, but the reload performance will be poor.

              If you keep this schema the application, prior the charts, could take too long to calculate, but the reload process will be faster.

              best regards,

              Fernando D'Agosto Yes

              • datamodel for datawarehouse
                John Witherspoon

                 


                ThomasWRieck wrote:nobody any oppinion ? is it complete nonsense ?


                I wrote up a long opinion days ago before deciding not to post it and deleting it.

                Basically, if I understand what you're thinking, I suspect you're making a mistake. Two business systems that I wrote use generic data like this. It made sense for the business systems. As you said, easy to extend, easy to support, and high performance, at least for those cases (and some would debate me on the "high performance" part). I have so far only read data from one of these systems into QlikView, and when I did, I converted all of the "virtual fields" into real QlikView fields. I do not feel that a generic database is a good data model for QlikView, even if it was a good model for the source system. Our "data warehouse" of QVDs is made up of a whole lot of specific QVDs with specific fields, not generic data, or even abstracted data.

                A substantial number of our source databases are at least partially-abstracted, with very general-sounding fields like "demand" "item" "consumer" "transfer method" and so on. In this case, I might store an order item in this abstract database, with demand = order item ID, item = product ID, consumer = customer ID, and transfer method = method of shipment to the customer. But when I load it into QlikView, I create a QVD specific to order items, and the fields get more concrete names like "order item", "product", "customer" and "shipment method". The abstraction goes away. Different types of objects stored in the same abstract database get split apart into multiple QVDs. I would not try to keep generic and abstract when moving the data into QlikView. I would make everything very specific, which is the opposite of what I think you are thinking of doing.

                The main reason I didn't post it is that I honestly have no idea what your requirements are, or what you're thinking of doing. I don't feel confortable telling you what your data warehouse should look like based on a few small comments about how you want it to be, and my possible misinterpretation of those comments. Your shop may be completely different from my shop. If you take my advice, it might be a complete disaster, and your approach might have been much better. I simply don't have the information, or the time to learn the information, that would allow me to offer what I would consider useful advice.

                Anyway, those are/were my thoughts. I probably should have just posted them the first time. At least it's better than no reply.