We are trying to have a set of common dimensional QVDs that any application can use rather then having each app pull product hierarchy from the database. This is causing us to have large "star" schema applications. Some of our larger Qlikview applications (5-10GB QVWs) perform very slowly and I think if I could get rid of the star table schema it would really help.
To solve this the fact table must contain all the dimensions but in order to do that they must be joined by the database. Does anyone know how to join QVDs so we can continue to use our common dimension process? My end problem is the performance issue so if I'm tackling this the wrong way please let me know. We have used Qlikview optimizer to eliminate any large columns we can.
having a set of common QVD files is the right approach - it will shorten your data load time and make the whole process more organized. I don't understand why is it causing you to have large "star schema" applications. Perhaps you always load the whole QVD file into the QVW, which is probably a mistake.
Star Schema by itself should not be too bad, unless it's overly normalized (like most Data Warehouses do).
To avoid loading full tables, you can use function exists() and only limit the Dimensions to the needed ones. For example:
from Transactions.qvd (qvd);
from ItemMaster.qvd (qvd)
where exists (ItemNumber)
from CustomerMaster.qvd (qvd)
where exists (CustomerNumber)
In the example above, only those Items and only those Customers that exist in the Transactions table, will be loaded. Notice that using exists() does not cause any problems with the optimized QVD load - the load remains optimized (as opposed to using other where conditions).
If I'm missing your problem, please extend, and I'll try to help.
Oleg, thanks for this suggestion. I understand the "where exists(key)" will limit the amount of data but that isn't my current performance issue as we are in testing and the dimensions only contain the applicable data (great tip for as we roll this out though).
It was my understanding that having a lot of linked tables in the final application can have negative performance vs having all data in one table. I suppose we are overly normalized since we have only one fact table and many linked dimensions.
Do you believe this could be our performance issue or should I be focusing my time elsewhere? When I look at the server task manager it seems a lot of the selections and graphs are running under one processor.
thank you very much.
Yes, performance slows down when aggregating data in one table based on another table. With 5-10GB QVWs it's certainly a good idea to look at creative denormalization. There are several other optimizations that can have dramatic impacts on memory and performance.
Chart calculations limited to one processor can be a side effect of a "data island" for example.
I'm curious to see your Table Viewer graph and an example of a slow formula, for an uploaded QVW example.
generally speaking, having many links that need to be resolved in the run-time may cause some latency, but I don't think that having a single Fact table and several "satellite" dimension tables would make such a dramatic impact.
If you are seeing that the action is single threaded - could it be that you have some VBScript macros ? Using Macros often is causing QlikVIew to go into a single-threaded mode.
You might have a single object/chart that's demanding a lot of resources. Open "Document Properties" right after loading a heavy screen, look at "Sheets" and see in the bottom what objects are using most memory and time to load.
You might have some fields with enormous amount of distinct values. Open Document Properties, Tables and look for a field with the most unique values. If you find a "monster" field, think what could you do with it. Best case - if you don't need it at all (For example, many times programmers load Transaction ID, just because they are used to it from their experience with transactional systems. For analytics, Transaction ID is useless (unless it's a link key). In any case, you need to find alternative solutions for the "monster" fields - either eliminate them or chop them into pieces or do something else.
You could also collect "Memory Statistics" and analyze them, to get a better idea about "what is eating up resources"...
Hard to say without looking at your application. If you are "stuck", engaging an experienced consultant is not a bad idea...
I may or may not be understanding the question, so I'm going to drop back to basics.
Let's say you have a sales table. The sales table has a sales ID, a product ID, a customer ID, a sales date, and a sales quantity. The product table has a product ID, product description, and other product information. The customer table has the customer ID, customer name, delivery location ID, and other customer information. The delivery location table has a location ID, an address, and so on. (edit: I suppose that's technically a snowflake schema rather than a star schema, but I don't think it affects anything I'll be discussing.)
I would indeed make separate QVDs here.
If I understand your question, that's what you have right now - separate QVDs for all of your dimensional tables (customer, product, etc.). And if I understand correctly, you're reading these QVDs into separate tables in the QVW, and you're suspicious that that's causing performance problems.
I do have my doubts that that's the cause of your performance problems. I haven't done benchmarks, so this is very wishy-washy anecdotal evidence, but I've seen no such problems when using highly normalized tables like this. I also theorize that QlikView internally handles highly normalized and highly denormalized data very much the same way due to its internal compression algorithms.
That said, there's also nothing wrong with joining all of these tables together into one master table in the QVW, and that's an approach I might typically take for a case like this. It is certainly worth trying, and might well improve your performance. If so, I'd be curious to hear more about it.
The approach I would probably use is a series of "left join load" statements to join them all together. Load the sales table, left join load the product table, left join load the customer table, and left join load the location table. The joins do not and I'd suggest should not be done in the database in this case, but only in the QVW script.
I'm worried that I may have misunderstood the question, though.
let me disagree with you on a couple of points...
First, I've seen a number of documented evidences of significant performance improvement achieved by "de-normalization". Generally speaking, it's pretty safe to say that "denormalized structures generally work faster in QlikView than highly normalized ones". Thee might be exceptions, but generally speaking, this is true...
Second, your suggestions to "left join"... From the initial description of the problem, it's quite clear that the data volumes are huge (and hence the concern about performance). Left Join is a very "expensive" operation, and it can extend the load time by a lot...
For the typical Dimension Table with a single Code and a single Description, I'd recommend using Mapping and not Left Join (see my Wiki about it). Left join could be possibly used when needed for those tables with much more than 1-2 fields. However, I would only de-normalize those tables as the "last resort", because the disadvantages of losing clarity in your data model can be worse than the performance gain.
A lot of it, however, is a matter of "personal preference" - except for truly huge data sets, the differences might be very saddle...
Well, let me agree with your disagreements then.
On the performance differences or similarities, I was definitely theorizing. And I'll definitely accept documented evidence over personal theorycraft, and accept that the denormalized tables are faster. I'll keep that in mind for the future.
As for the left joins, I was assuming that the star schema existed to bring in a lot more than mere descriptions for IDs, such as in the example I gave. I agree that using a map is superior and faster for descriptions. I agree that the left joins on large tables such as this are very expensive, but the expense is on the load side. I was assuming that the major performance problem we wanted to solve was on the reporting side, and that incurring some heavy load expense might be acceptable to improve the reporting performance. I definitely should have stated all that up front, though, rather than let Adam be surprised when suddenly his load took five times as long to run.
And actually, if the dimension tables really ARE nothing but IDs and descriptions, I'd probably change my initial recommendation to leave them as separate QVDs. I'd probably instead suggest just joining them during the database load, and making a single QVD. It just doesn't seem worth the headache of maintaining a lot of separate QVDs for data THAT simple. That said, we have quite a few QVDs in our shop that are that simple because the uderlying tables have a lot more fields, and we just haven't needed those additional fields in QlikView yet.
(edit: I suppose one could create multiple maps for the tables, one for each field to be loaded. Then you'd do a series of something like applymap('Customer Name Map',"Customer ID") as "Customer Name", applymap('Customer Delivery Location Map','Customer ID') as "Customer Delivery Location" and so on. That might well be faster, if a bit more complex. There might be a better way. I've never done anything like that.)
It sounds like the big problem is the single-threaded calculation of charts.
There's no special way to join QVDs, just the basic LOAD...JOIN sequence.
Your schema is reasonable. How to improve performance? A pivot table that groups a few amounts by Month, State, and Department hits 4 tables and is going to be slower than just one table. It seems that Lifestyle and Seg could be "mapped" to the LOC_CLASS table without much trouble.
I've been in a circumstance where it made sense to join the date fields to the fact table to reduce the number of tables involved in grouping. It might seem wasteful, but the nature of Calendar tables is that the fields are contiguous integers. QlikView stores these fields in a unique way that will improve performance without any additional memory.