8 Replies Latest reply: Feb 16, 2013 3:53 PM by Anosh Nathaniel RSS

    Loading a Data Warehouse


      I have worked in environments that have allowed me to house my data model on an Oracle database and write my ETL code in PL/SQL. I have used tools such as SSIS for fetching information where PL/SQL could not do it, but it is primarily related to simple extractions and transformations. The code for loading is mostly very complicated for the fact tables with a lot of business logic implemented.

      I want to understand whether it is possible/feasible to do the entire implementation directly from the source through scripting in Qlikview. If anyone has any views on this, please share the same. Thanks a lot in advance.


        • Loading a Data Warehouse

          The simple answer is - Yes, it's possible.

          The detailed answer is -RTFM. And It's better to start from Select and Load statements from Help.

          • Loading a Data Warehouse
            John Witherspoon

            I'm not sure if this matches your system environment, but I think of several of our systems as having three layers each:

            • Logic Layer 1 - produce business "facts" like "widget A cost $5000, weighs 10000 pounds, was produced on 9/25/07, was invoiced on 10/5/07"
            • Logic Layer 2 - combine and even massage the "facts" to support user needs. If the users decide that they want to see all their costs in $/ton, and to group on invoiced date instead of produced date, then we create new data like "widget A cost $1000 per ton on 10/5/07"
            • Front End - display the data from logic layer 2

            I've tried to name these layers neutrally to avoid pushing you towards any conclusions about where the layers belong.

            I would ALWAYS put logic layer 1 in the business system, and NEVER in QlikView. QlikView should never be CREATING facts about your business. It is a reporting system. Running your business is not its job.

            The front end would be in QlikView.

            Where it gets questionable is what to do with logic layer 2. This layer CAN be written in QlikView script, or it CAN be written in your business system. You could simply make a policy, for instance, that you will ALWAYS write this layer in your business system. That will simplify your QlikView applications, but of course complicate your business system. You could think about everything on a case-by-case basis, which is what I've done, putting logic layer 2 wherever I felt it made the most sense for that particular application. But I probably wouldn't make a policy to always put logic layer 2 in QlikView, as I suspect there are many cases where it's simply inappropriate, and better handled in the business system.

            One reason to consider doing logic layer 2 in your business system is that you're probably more familiar with your business system, and will simply find it easier and faster to do there than in QlikView. It takes a while to get the hang of complicated scripting in QlikView, and you could be biting off more than you can chew. Also, the mere fact that you call it "business logic" to me implies that it belongs in the business system. If you'd instead called it "reporting logic", then I'd be saying the opposite, of course.

            That doesn't really answer your question, because I think only you can answer your question. I'm just trying to give you the framework of how I approach these kinds of questions.

            This may also be a bit peripheral, but you mention "A Data Warehouse". I don't know if you're saying you already have one, or if you're thinking of creating one. QlikView doesn't really require a data warehouse, though it's perfectly happy to use one. If you don't have one, you may end up slowly creating a QlikView QVD library that serves as a data warehouse, at least if you're careful to create sensible, reusable QVDs. That happened semi-organically and semi-by-design in our shop, and I believe in many other shops as well.

              • Loading a Data Warehouse

                Hi John

                Thanks for posting such an elaborate response.

                Let me also elaborate on the situation that I find myself in. We have a data model for the retail finance industry already housed on an Oracle database. Facts and dimensions have been defined with ETL written in PL-SQL. This had been done as the targeted BI tool was Business Objects (yes we did not use the data integrator tool in BO). We have created dashboards on Qlikview on the same data-model that is built on Oracle database, and that worked out fine.

                Now, going forward, we plan to provide solutions on Qlikview to our clients. The biggest benefit of Qlikview as understood here is that we should be able to create the entire data model within Qlikview (directly from the source database) and not rely on Oracle database and PL/SQL etc. (one tool that does everything for you). Therefore, I am planning to create the same data-model (I talked about in the paragraph above) through Qlikview scripting.

                My fear, as you have already pointed out, is that I could be biting off more than I can chew and find myself in real jam midway through the implementation. I am worried whether I'll be able to implement the entire incremental refresh properly, and how I'll be able to rectify data if it is erroneously populated. If the data in the source system was faulty and rectified later, how would I go about fixing the information in the data-warehouse which is implemented in the form of QVDs (I wouldn't want to do a complete refresh again). I am very comfortable in PL/SQL and as you surely know it is very easy to implement complicated logic within such procedural languages (and you can easily update information in Oracle tables if required).

                I hope I have made myself clear. Seeking your further inputs/suggestions (if possible for you) on how to implement such reusable QVDs where data can also be rectified if required. Again my question is whether this is feasible?

                Thanks a lot!


                  • Loading a Data Warehouse

                    You never know what you can do till you try.

                    As John mentioned you can build your own QVD based data warehouse. There are few technics to do this. You can find it in Help if you type "QVD files and Incremental Load".

                    On top of this you can store table data in one QVD or in many QVDs (for instance per region, per product, per date and etc.) and combine/reuse it in final application according to your requirements.

                    I'm pretty hands-on with PL/SQL as well, but I can say, in some cases it's better to load data into QV first and do all aggregation/cleaning/derivation there. Definitely you can write some SQL to join some tables on Oracle side, but be careful about performance, maybe it's faster to load all the tables into QV and join them there.

                    From data stamp point, don't worry about amount of data, if you have proper table structure and do proper data cleaning you can load a huge amount of data in your application. So you have to have appropriate hardware too. For example in my case, I have server with 96Gb Ram, 4x6 core CPUs, and I load 0.5 billion raw records for main fact table. It takes about 23Gb in a memory and 7Gb in a file on a disk. I have 16-18 millions records in daily incremental load.

                    Everything is feasible ... with some assumptions or restrictions Wink You need just start doing this and you will see how fun it is.

                      • Loading a Data Warehouse
                        John Witherspoon

                        Well, it certainly sounds doable, and not out of line with what QlikView is commonly used for, but I'm obviously not in a position to estimate how long it would take. Yes, you can set up incremental loads of QVDs that can also fix old data if it is changed. However, you WILL need some way of identifying that data. If you have no way to identify which rows have changed, then obviously the only option is to read the entire table.

                        As far as our own environment, even though I have extensive SQL experience, I only VERY rarely do my joins in the SQL, and even then, it's SQL in QlikView script. And most SQL is simply used to build QVDs, while most user applications only read from QVDs. For a while, we would occasionally write a separate program to extract the data in a more usable format, usually dumping it to a file, and then reading the file into QlikView. The more experience I get, the less I do that, and the more manipulation I do directly in QlikView script. I don't think we have very many extract programs like that left, now.

                        • Loading a Data Warehouse

                          Hi Nick/Johm

                          Thanks again for your responses. I think you are right, I should just go ahead with it. And I am sure I'll find a way out in sticky situations. I guess the first thing to do is to unlearn some of the things I know. I am probably trying too hard to implement the ETL here as I would when writing PL/SQL!

                          Thanks a lot again for your insight and the encouragement!

                          Best Regards