First a little background. I have over 10 year s experience as a SQL developer and have also developed in SAP Universe designer and Webi. a couple of months ago I was introduced to Qlikview and initially wow'ed by its presentation and speed. I have received a good introduction by an external consultancy and they also provided the business a dashboard based on several requirements.
My question is 'Why use .qvw to ETL data in .qvw data store files as apposed to just loading direct from a database?'
Ok so now I've been playing with the dashboard for a couple of months and learning Qlikview I has created a Dashboard meeting all the original business requirements and added further reporting. The original consultants Dashboard was build upon a very literal ETL process; 1) Create qvw to extract data and store from database
2) create another qvw to transform and store data from Extract.qvws
3) finally create a load qvw as the final presentation dashboard from the Transform.qvws
As a result there are 4 points of failure and 4 points that have to be developed if there are any data changes (Database, Extract Layer, Transform Layer and Load Layer). The resulting final dashboard is around 700MB Compressed. (Uncompressed 2GB). The total on disk size of all the .qvws and stored data is 9.1GB (Compressed)
With my SQL developer logic my belief is that data should not be extracted and stored away from source, doing so adds further points of failure and could compromise the integrity and security of data. I therefore developed my version of the dashboard that would get the data direct from the Database server.
All extraction and transformation stays within the SQL Database source the Final user dashboard (load Layer) just reloads data direct from the Database tables. As result my version has only 2 points of failure and 2 points of development (Database and Dashboard) The size of my .qvw Dashboard is currently around 680Mb (Compressed) The size on disk is 680MB as there are no .qvw data store files.
Results in testing and comparing the 2 versions;
Original Dashboard 9.1GB v New Dashboard 680MB
Reloads are scheduled for both out of hours (however actual times are around 25 minutes for original ETL vs 11 minutes for New)
Report/Sheet selections and load times are approximately the same for both versions.
New version contains more reports/sheets and use of variables and states
12 tables are used.
Largest Table contains Approx. 30 Million rows
Primary Keys are used for automatic relation joins.
1 fact table and 11 Dim tables
Spoke design (not snowflake)
Now I understand that if the Qlik developer does not have substantial SQL experience or access to develop on the Database then the .qvw ETL process may make sense. Apart from this I am finding no reason why the direct to database solution should not be used and is actually more advantageous.
I would be very interested to here any views on the .Qvw ETL vs direct DB comparison and experiences of others.
I am still new to Qlikview and could be ignorant to some of the reasoning's behind certain logic or choices whilst developing and so would be open to any enlightenment to help me develop in the future.
Many users need to cope with slow extraction from data bases and/or limiting load on the database server to a minimum and / or supporting multiple dashboards based on the same (sub)set of data tables.
In these cases, a QVD layer (the layer with QVWs that create the QVDs) may be very helpful, though it is indeed adding another layer to your architecture. A QVD layer may also support an incremental LOAD process, thus additionally reduce the amount of records read from the source DB in each reload.
Some people prefer to separate the data modeling (script that uses the QVDs to build up the data model) from the visualization, adding another layer of QVWs (the dashboard QVW will use a binary load to load in the model only and add the UI parts). You can have several dashboards build upon the same data model.
Disk space is not the key focus here (in your case, I assume the ETL layer is extracting almost all tables unfiltered and you are applying filters in the data model for your dashboard? Just because the differences in disk size seems quite extrem).
But of course, your focus and use case may be different, so I believe it's not a black or white thing.
You ask the very right question and IMHO you do not need to build QVDs and Extract QVWs just for the sake of building those. Stefan already gave you very good points and I just wanted to elaborate on a few.
I was in the same boat myself, having similar background. My first project was a fairly complicated one and we decided to build separate a QVW for the data model and a separate QVW for the presentation app - If you do not want to use QVD layer, I highly suggest splitting at least data model QVW from presentation. Your volumes are not huge but even with 30 million rows you will appreciate this two layer architecture as it will speed up data reloads of your presentation app when you make changes to it. It is just easier to keep them separate from maintenance perspective. I do not think it adds additional point of failure as you simply do a binary load from data model QVW to presentation app. Back to that project of mine - all ETL was done using SSIS and data were ultimately stored in a nice star schema on SQL Server. The database also was in our local data center so I could reload 100 million rows every morning from SQL Server in 20 minutes. All heavy lifting was done there. I only used QVD for the master calendar and that was the only QVD.
My second project involved loading data from a remote and slow Oracle database - I was lucky to get 1000 rows per second. Even simple and small reloads would take hours and this when I started to appreciate QVD layer basically using three tiers - QVD Generator QVW that would pull data from source system with very minimal transformations, data model QVW where I would do transformations, joins etc and finally presentation app. The value of QVD layer was just making development much easier. Also I was not able to use SSIS/SQL Server for that project for various reasons and I did not feel it was necessary as ETL was very simple and a final data model had just 3 tables.
If you need to do complex ETL with many parallel processes, QlikView is not good at doing this as it is very sequential. I would totally use your favorite ETL tool and database and would just use QlikView as a nice visualization tool (one of the best in my opinion). Keep in mind a lot of people on this forum and a lot of consultants do not really have experience like yours or they do not have access to these tools. A lot of them have only QlikView and this is all they have at their disposal. You are in a better place because you can architect and use the best tool for the job.
You will be surprised how many features QlikView has and what power you get with it but it is not necessarily the best tool for everything. I've met some folks who built entire "data warehouses" in QlikView using QVDs and QVWs - personally I would never do that myself but I am still amazed that it is possible with QlikView and some companies are very successful doing that.