Making Sense of QVD files

Dalton_Ruer · Jan 31, 2025 7:29:50 AM

Problem

I see you out there my friend. You got parachuted into an organization that has been using QlikView or Qlik Sense for many years, and you find yourself with directories and sub-directories full of QVD files.

You weren't around when the architecture was put in place.

The boss wasn't around when the architecture was put in place.

And all of you are scratching your head trying to understand what they are, and why they are there. Because after all QlikView and Qlik Sense are nothing more than #DataVisualization tools. Right?

Base Understanding

While QlikView and Qlik Sense can definitely visualize data and the Associative Engine is ideal for Analytics, the truth is both products provide much more functionality. In this article I will be helping you understand why those QVD's are there and help you see how that fit into the overall Data - to - Outcome platform. Notice I've highlighted sections in red that I will focus on for this post. Wow, nothing to do with visuals/analyze at all.

For the next few minutes I want you to change hats. If you are in an IT type role, please put on a business user hat. And if you are in business please put on an IT hat.

IT wants to ensure the data surfaced is governed and high quality. The process of doing that can take a long time without the right tools.

Business users want outcomes, and typically want the answers they need yesterday. Frankly the last thing they want to hear is "it is going to take a long time before you can get the answers you need."

Qlik Virtual Data warehouse tables

Those two personas don't have to be mutually exclusive. The purpose utilizing QVD files for QlikView and Qlik Sense was to provide a "virtual" data warehouse. For 3 reasons:

A Centralized data warehouse didn't exist and wasn't going to be built
A Centralized data warehouse was going to be built, but it was going to take time and business needed answers before it was complete
A Centralized data warehouse did/does exist, but other sources of data were also needed that were not part of it

Thus Qlik (Architects/Modelers/Engineers/Developers) utilize QlikView or Qlik Sense to construct a "virtual" warehouse from those sources.

Benefits of QVD files

The logical question you might have is "What are the benefits of using QVD files instead of just reading the data straight into an application?" That's fair and deserves an answer. So here are the reasons why QVD files are used, rather than just reading straight into your applications:

Single Source of Truth

Data is only part of the battle in providing outcomes. The other part of it is all of the expressions that are needed to actually provide the "right answers." Very few times can you ever just SUM or COUNT a column. Many times there could be a whole bunch of nested if's that drive the "right answer." Only count this value if the user name doesn't contain the word "test" and only if the transaction code doesn't start with "zzz." So, while the source system and even a centralized data warehouse and/or centralized data marts may have all of the data, that logic had better be applied. QlikView and Qlik Sense developers have put that logic in, and when the QVD's are stored that logic has been applied. Thus reducing the need to repeat all of that intellectual property in the myriad of analytics applications that will visualize the data.

Compressed

One complaint I hear often is that "Qlik is just duplicating the data all over the place." Well that's understandable, but misguided. The fact is that QVD files often see a 90-95% compression ratio over the "flat data" because of how the Associative Engine works. I won't delve into all of that in this post, but please feel free to reach out if you can't find information that helps you understand why there is such a high compression ratio.

Development Speed

Typically the IT and Business relationship looks like this ... "Give us all of your requirements in triplicate and signed in blood. We will then prioritize your request and deliver a finished application in 13 man months." With Qlik that can change for the good for all involved. Developers/Designers can sit down with business users, and do a rapid business intelligence session and see what is needed to get the right answers so that the right actions can be taken to drive outcomes. That is made possible by the fact that they can simply start with QVD's that already contain the single source of truth and work with, instead of against, the business users.

Imagine you are a developer and you need to add 1 variable to the load statement in the application you are working on side by side with a business user. You type the change in the load script in say 30 seconds. Then you tell the business user "Hey let's go to lunch while we wait for 2 hours to go hammer our source system/data warehouse/data mart to pull the needed millions/billions of records." That kind of defeats the purpose of working side by side.

Delivery Flexibility

Regardless of your persona you probably realize that you have several tables in your source systems or data warehouses that get reused over and over and over and over and over ... sorry I lost my train of thought for a second. Tables like Customers/Patients/Employees etc that nearly every single application/dashboard will end up using. QVD's provide the ability to access those tables, do the IP work to make them a single source of truth, and then reuse them in as many applications as you need them for.

You might have millions/billions of fact records dating back 20 years. QVD files provide the ability to "partition" out those facts: Sales_2000.qvd, Sales_2001.qvd, Sales_2002.qvd. You get the picture. In some cases they might actually be down to the day. Some applications might only need a very current set of values, others need a rolling 13 months, others a rolling 3 years, while a few might want to access records for all time based on some type of filtering condition.

Incremental Loads

One thing we all understand about data in source systems, data warehouses and data marts is that it is ever growing. Applications need to stay current. Because nobody is going to achieve good outcomes based on data that is a year old. While we have millions/billions of fact records but the truth is that the vast majority of them are the same that they were 10 minutes ago, a day ago, a week ago. With QlikView and Qlik Sense the data architects can perform what we refer to as Incremental Loads. We talk to the source systems, data warehouses and data marts and say "Hey give me the data that has changed since the last time we talked." (That's just pseudo code not a real SQL statement.) The new data, changed data and deleted is then merged with the existing millions/billions of rows to keep the data fresh and the original QVD is overlayed. This concept may well be the highest benefit and impacts so many of the other benefits.

One complaint I hear is that "we really need to get answers from live data." Well the truth is running a query that has to return millions of rows of data, especially if joined to other tables takes many, many minutes to return. Can you really say you are dealing with "live" data when it takes 20 minutes to return the answer????? NO!!!! The answer from the live data is actually 20 minutes because tables were locked to return a consistent answer. With incremental loads you could actually ask for the data every few minutes and merge it. The visual applications can provide data that is fresh as of 5 minutes ago, not 20.

Security

QVD's provide a layer of security protection for the source systems, data warehouses or data marts. Qlik developers can utilize QVD's that an architect has created, without needing any access at all to the systems where the data originated. That probably seems like a small benefit, but in my eyes it is actually HUGE. You see in environments where the source OLTP systems/mainframes themselves are being hit, having developers hammer it whenever they want, is probably not ideal. Below I will share the impact and monetary costs that are reduced, but for now I want you to think in terms of the more important impact.

If the analytics developers can't be trusted not to impact source systems where transactions are being done ... database administrators can absolutely simply remove access. I know, because as a DBA for over 25 years I absolutely removed peoples authority. They then either had to deal with day old data in some restored system I provided, or they had to manually dump data from some report. But I, and other DBA's, simply can't allow the OLTP system to degrade and stop users who are doing the work needed that the business is based on.

Combined with the benefits of incremental loads enables Qlik to be very minimally impactful of the source systems. Which then allows for fresher data. Which can then lead to more rapid insights and outcomes.

Reduced impact and $$$

While I think I have addressed the fact that by using QVD files instead of hitting they systems over and over and over for every single application I wanted to conclude by focusing on it 1 more time. The reduced impact because of incremental loads and not having to ask for the same data over and over and over allows for fresher data faster. Reducing the impact on mainframes saves a lot of money, because MIPS are really expensive. The same is true for cloud data warehousing systems where you pay based on consumption. Qlik (and me) absolutely understand that there are occasions where you do need to hit for "live" data. We support that, and I've written about that and provided videos for that. But for most cases incremental loads and QVD's actually provide far better real world performance. While also potentially saving you an awful lot of money in the process.

Thus even for systems where your organization does have spectacular data warehouses and data marts, you most likely will still see huge benefits from taking advantage of QVD files.

QVD Layers

Uggh! This poses questions from so many people. I see Sales.QVD in a directory that says "Layer 1", and in a directory that says "Layer 2" and in a directory that says "Layer 3." I have documentation and Power Point slides that show layers and layers. What in the world are "QVD Layers?"

Like a super secret handshake amongst top secret agents, the term "layer" is something that only had meaning to Qlik data architects. In your organization they well may be gone for now and here you are holding the bag and trying to explain to your boss, your bosses boss and their next door neighbor's, cousins best friend.

Let me relay the concept in more modern terms. In a modern data pipeline like Qlik Talend Cloud presents, you might think in terms of Lakehouse or a Medallion architecture. For those you know you have 1 swim lane where you simply "land" the raw data. The next swim lane would be what you might refer to as "bronze/storage." The CDC changes are batched together so that you have access to a consistent set of fresh data. As you continue to the right you have a "silver/transformation" area where you do transformations to the data. You change column names, you add columns, merge columns, join data etc. In the final swim lane you have you have a "gold/mart" schema that is now "business ready data."

Wait a second ... if my eyes don't deceive me ... it would appear that "Qlik Layers" actually represent those exact same swim lanes. Well that is pretty cool right. Qlik has been implementing a "modern architecture" since 1993. Hopefully this image helps you, and your boss, our bosses boss and their next door neighbor's, cousins best friend understand that QVD Layers are a good thing.

Don't miss this

QVD files are nothing more than Qlik Virtual Data warehouse tables that are used in environments where centralized data warehouses/marts didn't exist, are in the process of being used or where additional sources are being used that aren't stored in the centralized repository. There is absolutely no reason that Qlik "Landing/Merged" QVD files have to be read from source systems. If you are using a CDC migration tool like Qlik Replicate or Qlik Talend Cloud ... QlikView and Qlik Sense can just as easily read from the cloud warehouse you have moved the data to. If you have built a data warehouse/data marts manually or with a great automated push down tool like Qlik Compose or Qlik Talend Cloud ... you can read that data into QlikView or Qlik Sense. I have presented to you the benefits I see, even in those environments, for then storing the data into QVD files and adding more to it for a single source of truth.

I also don't want you to miss the fact that while QVD files are themselves proprietary ... they actually represent the concept more than the file extension. There is no reason you can't store to a parquet file format instead of QVD format to utilize within Qlik or external to Qlik. Your "virtual" data warehouse tables can be stored within your Qlik on premise file system, or externally to things like S3 buckets that are available as you need them to be.

Finally - Just like data warehouses in general, QVDs (and there concept) are used in architectures where the data tables are going to be needed by several different applications and in several different ways. If you have data that will only be surfaced in 1 application, feel free to read the data directly in that application and make every transformation and visualize it right there.

Video

If you like listening more than reading, or want to share it with others who might not read a lengthy post like this ... I recently did a Dork Cast, showed my split personalities and talked through all of these benefits.

Update - 1/31/2025

I elude to storing into Parquet files in this post. If that sparks your interest be sure to check out my complete Qlik Dork post on how to do that. https://qlikdork.com/2025/01/diving-into-parquet/

martingries · ‎2025-01-29

Dear Dalton,

thanks a lot. Very good & helpful article.

BR

Martin

marksouzacosta · ‎2025-01-29

Thanks for sharing, Dalton.

marcginqo · ‎2025-01-29

Thanks for sharing @Dalton_Ruer. Great way of explaining the roles of QVDs.

ahmed_abid · ‎2025-02-24

Thanks, Dalton, for your insights on QVDs. I've been advocating the use of QVDs for more than eight years now, mostly in projects where there was no dedicated data warehouse.

But, recently I am in a situation where we have a separate data warehouse, and I was not sure if I should listen to IT to get rid of QVDs. Although QVDs have still their benefits, but probably I can take a step back on my take now.

Making Sense of QVD files