
- Move Document
- Delete Document and Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Making Sense of QVD files
Problem
I see you out there my friend. You got parachuted into an organization that has been using QlikView or Qlik Sense for many years, and you find yourself with directories and sub-directories full of QVD files.
You weren't around when the architecture was put in place.
The boss wasn't around when the architecture was put in place.
And all of you are scratching your head trying to understand what they are, and why they are there. Because after all QlikView and Qlik Sense are nothing more than #DataVisualization tools. Right?
Base Understanding
While QlikView and Qlik Sense can definitely visualize data and the Associative Engine is ideal for Analytics, the truth is both products provide much more functionality. In this article I will be helping you understand why those QVD's are there and help you see how that fit into the overall Data - to - Outcome platform. Notice I've highlighted sections in red that I will focus on for this post. Wow, nothing to do with visuals/analyze at all.
For the next few minutes I want you to change hats. If you are in an IT type role, please put on a business user hat. And if you are in business please put on an IT hat.
IT wants to ensure the data surfaced is governed and high quality. The process of doing that can take a long time without the right tools.
Business users want outcomes, and typically want the answers they need yesterday. Frankly the last thing they want to hear is "it is going to take a long time before you can get the answers you need."
Qlik Virtual Data warehouse tables
Those two personas don't have to be mutually exclusive. The purpose utilizing QVD files for QlikView and Qlik Sense was to provide a "virtual" data warehouse. For 3 reasons:
-
A Centralized data warehouse didn't exist and wasn't going to be built
-
A Centralized data warehouse was going to be built, but it was going to take time and business needed answers before it was complete
-
A Centralized data warehouse did/does exist, but other sources of data were also needed that were not part of it
Thus Qlik (Architects/Modelers/Engineers/Developers) utilize QlikView or Qlik Sense to construct a "virtual" warehouse from those sources.
Benefits of QVD files
Single Source of Truth
Compressed
Development Speed
Typically the IT and Business relationship looks like this ... "Give us all of your requirements in triplicate and signed in blood. We will then prioritize your request and deliver a finished application in 13 man months." With Qlik that can change for the good for all involved. Developers/Designers can sit down with business users, and do a rapid business intelligence session and see what is needed to get the right answers so that the right actions can be taken to drive outcomes. That is made possible by the fact that they can simply start with QVD's that already contain the single source of truth and work with, instead of against, the business users.
Imagine you are a developer and you need to add 1 variable to the load statement in the application you are working on side by side with a business user. You type the change in the load script in say 30 seconds. Then you tell the business user "Hey let's go to lunch while we wait for 2 hours to go hammer our source system/data warehouse/data mart to pull the needed millions/billions of records." That kind of defeats the purpose of working side by side.
Delivery Flexibility
Incremental Loads
One thing we all understand about data in source systems, data warehouses and data marts is that it is ever growing. Applications need to stay current. Because nobody is going to achieve good outcomes based on data that is a year old. While we have millions/billions of fact records but the truth is that the vast majority of them are the same that they were 10 minutes ago, a day ago, a week ago. With QlikView and Qlik Sense the data architects can perform what we refer to as Incremental Loads. We talk to the source systems, data warehouses and data marts and say "Hey give me the data that has changed since the last time we talked." (That's just pseudo code not a real SQL statement.) The new data, changed data and deleted is then merged with the existing millions/billions of rows to keep the data fresh and the original QVD is overlayed. This concept may well be the highest benefit and impacts so many of the other benefits.
One complaint I hear is that "we really need to get answers from live data." Well the truth is running a query that has to return millions of rows of data, especially if joined to other tables takes many, many minutes to return. Can you really say you are dealing with "live" data when it takes 20 minutes to return the answer????? NO!!!! The answer from the live data is actually 20 minutes because tables were locked to return a consistent answer. With incremental loads you could actually ask for the data every few minutes and merge it. The visual applications can provide data that is fresh as of 5 minutes ago, not 20.
Security
Reduced impact and $$$
QVD Layers
Uggh! This poses questions from so many people. I see Sales.QVD in a directory that says "Layer 1", and in a directory that says "Layer 2" and in a directory that says "Layer 3." I have documentation and Power Point slides that show layers and layers. What in the world are "QVD Layers?"
Like a super secret handshake amongst top secret agents, the term "layer" is something that only had meaning to Qlik data architects. In your organization they well may be gone for now and here you are holding the bag and trying to explain to your boss, your bosses boss and their next door neighbor's, cousins best friend.
Let me relay the concept in more modern terms. In a modern data pipeline like Qlik Talend Cloud presents, you might think in terms of Lakehouse or a Medallion architecture. For those you know you have 1 swim lane where you simply "land" the raw data. The next swim lane would be what you might refer to as "bronze/storage." The CDC changes are batched together so that you have access to a consistent set of fresh data. As you continue to the right you have a "silver/transformation" area where you do transformations to the data. You change column names, you add columns, merge columns, join data etc. In the final swim lane you have you have a "gold/mart" schema that is now "business ready data."
Wait a second ... if my eyes don't deceive me ... it would appear that "Qlik Layers" actually represent those exact same swim lanes. Well that is pretty cool right. Qlik has been implementing a "modern architecture" since 1993. Hopefully this image helps you, and your boss, our bosses boss and their next door neighbor's, cousins best friend understand that QVD Layers are a good thing.
Don't miss this
QVD files are nothing more than Qlik Virtual Data warehouse tables that are used in environments where centralized data warehouses/marts didn't exist, are in the process of being used or where additional sources are being used that aren't stored in the centralized repository. There is absolutely no reason that Qlik "Landing/Merged" QVD files have to be read from source systems. If you are using a CDC migration tool like Qlik Replicate or Qlik Talend Cloud ... QlikView and Qlik Sense can just as easily read from the cloud warehouse you have moved the data to. If you have built a data warehouse/data marts manually or with a great automated push down tool like Qlik Compose or Qlik Talend Cloud ... you can read that data into QlikView or Qlik Sense. I have presented to you the benefits I see, even in those environments, for then storing the data into QVD files and adding more to it for a single source of truth.
I also don't want you to miss the fact that while QVD files are themselves proprietary ... they actually represent the concept more than the file extension. There is no reason you can't store to a parquet file format instead of QVD format to utilize within Qlik or external to Qlik. Your "virtual" data warehouse tables can be stored within your Qlik on premise file system, or externally to things like S3 buckets that are available as you need them to be.
Finally - Just like data warehouses in general, QVDs (and there concept) are used in architectures where the data tables are going to be needed by several different applications and in several different ways. If you have data that will only be surfaced in 1 application, feel free to read the data directly in that application and make every transformation and visualize it right there.
Video
If you like listening more than reading, or want to share it with others who might not read a lengthy post like this ... I recently did a Dork Cast, showed my split personalities and talked through all of these benefits.
Update - 1/31/2025
I elude to storing into Parquet files in this post. If that sparks your interest be sure to check out my complete Qlik Dork post on how to do that. https://qlikdork.com/2025/01/diving-into-parquet/

- Move Comment
- Delete Comment
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Dear Dalton,
thanks a lot. Very good & helpful article.
BR
Martin

- Move Comment
- Delete Comment
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Thanks for sharing, Dalton.

- Move Comment
- Delete Comment
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Thanks for sharing @Dalton_Ruer. Great way of explaining the roles of QVDs.

- Move Comment
- Delete Comment
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Thanks, Dalton, for your insights on QVDs. I've been advocating the use of QVDs for more than eight years now, mostly in projects where there was no dedicated data warehouse.
But, recently I am in a situation where we have a separate data warehouse, and I was not sure if I should listen to IT to get rid of QVDs. Although QVDs have still their benefits, but probably I can take a step back on my take now.