First of all, not joining is usually better than joining. QlikView handles a normalized data model very well.
The only case I see where the performance is slightly better with the tables joined instead of un-joined is if you have two large tables (many records) linked to each other. E.g. you have four tables Customers-Orders-OrderDetails-Products and both Orders and OrderDetails are potentially very large. Then you should try joining these two tables (but not Customers and Products) to see if the performance improves.
Secondly, an SQL JOIN is performed on the DB server, whereas the QlikView Join is performed locally in memory. In other words: If the tables are very large, you should do it on the DB server - i.e. in the SELECT statement.
My oppinion is different to Henric (Don't get mad Henric!)
Im agree that the QV model is very good. For example, if you have a transaction table, keep the customers table as separate table, and lets the "Cust_ID" field do the job without join.
But, if you have a simply field (for example, invoice type and description), based on my experience, is better to join that to Invoices table. The reason is that the applications works better with less tables, even if the memory used is longer.
All cases are different. Some times, is better to keep the transactions tables (for example, head and body of the invoices) separately, but some times it works faster while operating if you join those tables in only one.
You have to consider the following things:
1- A field with a few values with too much repeats isn't too bad for the dataset in QV.
2- A key field that is not optimized (for example, with text/number format) takes a lot of resources.
3- A less number of tables is a good thing in QV
4- Ever will be a GOOD choice to test between two schemas!! With time, you will can intuit the results, but don't discard a surprise!
In QV the things are a little different to other BI systems. You could get a hole datawarehouse in only one table (with more difficulty to make expressions, sure) if you want, or you can create a "beauty DW"... Every time, test is a good idea.
For example, in a customer i had two big transactional tables with Head and Body parts of each invoice. In each sale, the product could have different prices. The best solution (for memory cost and performance) was join Head and Body tables, and make a separate table with the version of prices of each product.
I hope this helps.
I agree with Sebastian. And when testing the model performance, if you have slow performing charts, check that the expressions do not span too many tables. If the data is not joined in the model, QV has to perform the join in memory when evaluating the expression. If the expression includes fields that are 2 or 3 steps removed from each other, this can become very compute intensive.
So the degree to which you denormalise the data depends on the expressions you use.
Just my 2c worth
BTW - these things are only an issue when the data starts getting larger, and the expressions more complex and the charts take more than a second or two to render.
There is more than one way to skin a cat. This is especially true for QlikView data modelling.
Both of you raise valid points and everything you say is in principle true. So, when it comes to choosing data model, you have to weigh pro:s and con:s. Which isn't always easy. But it helps if you are informed. (which both of you seem to be .)
Some arguments for normalization:
- A normalized model is easier to manage. Sooner or later, someone else will try to change your application: read your script and try to figure out what you have done. Then a simple script with a normalized model is much easier to understand.
- Denormalization often increases the permance only marginally. Why then do it? An exception is when you have two large tables next to each other; e.g. Head and Body of Orders or Invoices. (I also always join these.)
- Denormalization often means that you change the number of records so that Sum() and Count() do not work any more; e.g. if you have a number in the Head table, you cannot sum this after the join - the Sum() will count the same number several times. This can be solved, but it is much easier to make a mistake and write a faulty expression in a denormalized model.
So my view is basically that you shouldn't look at performance only. Also manageability and robustness (how easy is it for a user to enter a faulty expression) matters. Do I use denormalized models? Of course I do. But I never start with a denormalized model as the goal.
Bottom line: Both ways work, and both ways have pro:s and con:s. And I do not think that we are that far apart in our opinions...
If you want to join Invoicetype and Description onto the Invoice table, you should use Applymap and not Join. Only this way can you assure that you don't get extra records produced through the join.
Hi Henric, i didnt understood about the "extra records produced through the join"...
If I use a table, which contents in field A, only values 1, 2 or 3, and use Left Join with other table that has 1, 2, 3 and 4, the "Left" clause will load only records with 1, 2 or 3... why you said that Apply map is the "only way to assure..."?
Let's take your example wtih InvoiceType above.
We assume that
- Table A contains invoices. One record per invoice.
- Table B contains invoice types and invoice descriptions. (Yes, it's a stupid "normalization", but nonetheless fairly common. I have seen this in real life more than once.)
The goal is to get the InvoiceType from table B into table A and then use the resulting table in QlikView. (Which, by the way, is a "join" that I recommend that you do.)
Now, if table B contains two records for a specific invoice number N (which it shouldn't, but I have seen also this type of error in real life) then both records from table B will be joined to the single record in table A and corrupt table A so that the QlikView calculations will be incorrect. Invoice N will be counted twice. It doesn't help that you use "Left join". The record with invoice N is there and will be duplicated.
Databases shouldn't contain this type of errors, but - believe me - they often do. Check the number of records before and after the join, and you will sometimes be surprised that it is not the same.
To be sure to not change the number of records, I always use applymap() as a lookup function in these situations. It is just a translation from InvoiceID to InvoiceType. Then I know that I have the same number of records before and after.
Im my situation i have a main qvd(over 5 gb) and some description tables(7-8). in description tables i have some null values for some fileds. Since i dont want to see these null fields, i have decided to connect these tables to main qvd by left join. after joining this tables by left join, the qvd file size decrease about 1,8 gb. i think it would increase the performance since size of qvd file is falling. I have also checked the Sum() and Count() . fortunately, they dont change.based on your comments, would i say making left join is better for my situation?
Additionaly, since i want to remove the null fields can i use "keep" for that. as i know difference between keep and join is that keep dont merge tables and after keep operation tables are still separate.
Thanks to all.
If the description table has the same primary key as the fact table, then you should join.
But if the key to the description table is a foreign key in the fact table and the number of records in the description table is several magnitudes smaller than in the fact table, then you should not join. Use Left Keep instead.
i have seven description tables and main qvd table. the size of description tables are too much smaller than main qvd table and primary key of description tables are foreign key to main table. if i use "left keep" instead of "left join" then the only difference i see is separation of tables, not merged table. i wonder what are the differences between keep and join. why i sould use keep for this situation?
Henric, thanks for the suggests.
As you said, the FULL knowledgement of each table is my first stage in every work i do. If i have more than one record for only one kind of invoice, following with this example, my problem is other, not the number of registers in the fact table.
So, I haven´t used ApplyMap too many times. I used, in general, with addition of description fields when is not neccesary to keep the key. But, in this case, how does ApplyMap to give only one result? For example, an invoice in Fact Table includes the key ID_Invoice_Type='12'. In the Invoice_Types, as you said in the example, there are two entries:
You said that Join should create two registers, one for each Invoice_Type_Desc, and i agree with this. But, as you said, ApplyMap guarantees that this wont occur... How "Knows" ApplyMap which Desc read for each ID_Invoice_Type=12??
Both solutions will work fine. But joining additional fields onto an already large fact table will increase the memory demand of the fact table. Also, this case, where you have small dimensional tables, is a typical case where the QlikView multi-table calculations work well. So I just do not see any reason to join.
Applymap only returns one value - the one that is first encountered in the mapping table. The solution should be something like
applymap('MappingTable',InvoiceID) as Invoice_Type_Desc