7 Replies Latest reply: Nov 12, 2009 10:57 AM by jeannepetersen RSS

    Data Modelling

      Hi,

      I have been trying to find any documentation, website or anything that can give me advice on what the best practices or best ways are to go about data modelling in QlikView, but I haven;t found anything.

      Can someone please provide links or info regarding this matter.

      I would like to know what is the best way to develop in QlikView:

      - What do I do when I have more than one Fact table? Do I concatenate them?

      - Do I always concatenate codes and descriptions?

      - What should I be weary of?

      - What will cause performance issues, not just in my script, but also stuff like charts?

      - What is the best way to go about Security, especially multi-tiered?

      - Why do I get 'object exceeded allocated memory'?

      I need answers and tips on how to develop, script and deploy the optimum QlikView solution. Please do not refer me to the Reference Manual, because I have looked in there for Data Modelling tips, but there aren't any.

       

      Please advise.

      Thanx!

        • Data Modelling

          To make it easy.

          Data model advised is Star schema. If you have different fact tables, you can concatenate them, but it is not mandatory... Depend on the structure and complexity.

          To create keys, you can use feature Autonumberhash128(key1,key2) and then keep the description or the data in the normal table.

          Performance issues can be various, bad links, tables with no links, synthetic keys, not optimized expressions in charts, etc..)

          Object exceeded allocated memory can be cause by a memory overflow ( expression really too complicated) or by an error in your chart expression.

          For security, everything is described in the user guide (sorry !) but it consists of creating an access section.

          If you need mode information on some subject, I guess there is a lot already developped nin the forum. But of course don't hesitate

          • Data Modelling
            John Witherspoon

             


            hesten wrote:- What do I do when I have more than one Fact table? Do I concatenate them?

            Not necessarily. In my experience, star or snowflake schema is the generally preferred approach. That said, probably only half of my applications could actually be described as using star or snowflake schema. Different approaches work better for different things. I can't tell you how best to design your data for QlikView in general any better than I can tell you how best to design your data for a relational database in general. I will say that data normalization isn't really a concern due to the read-only nature of the data and the way that the data is compressed.

             


            hesten wrote:- Do I always concatenate codes and descriptions?

            Are you asking if you should concatenate codes and descriptions in a field's value like this, "John Smith (12345)"? If so, my answer is sometimes. In the case of names, for instance, they cannot be counted on to be unique. I then think it's important to concatenate the code to the name to enforce uniqueness and allow the user to distinguish between two different John Smiths. But in many cases, the descriptions are just as unique as the codes, and may be perfectly adequate without concatenating the code.

            Or are you asking a data modeling question? I suppose that's much more likely given the subject. So, say, you have an invoice table with customer IDs but not names, and then a customers table with the customer ID and the name of the customer. Are you asking if you should denormalize the name of the customer onto the invoice table? If so, again, my answer is sometimes. If all I'm pulling from the customer table is the name, it seems wasteful to me to clutter the layout with an additional table for one extra field. So I would probably just put the customer name on the invoice table. As mentioned above, denormalization like this simply isn't an issue in QlikView. On the other hand, if I wanted several fields from the customer table, or if other tables references the customer ID as well, I would likely build a separate customer table.

             


            hesten wrote:- What should I be weary of?

            Wary of? Synthetic keys and loops. QlikView will not let you create any kind of loop in your data structure. And if two tables share two fields instead of just one field, QlikView will create a synthetic key that combines the two. Essentially, it's creating a new table for you. Sometimes this is exactly what you want. Synthetic keys aren't always bad. But if you see them appear when you didn't intend them, chances are that you're doing something wrong in your data model. Have a look at the fields involved, and see where you need to trim some things.

             


            hesten wrote:- What will cause performance issues, not just in my script, but also stuff like charts?

            Performance is a very big subject. Avoid FOR loops if you can. Avoid loading data you don't need. Pretty much never do a LOAD *, just like you should almost never do a SELECT * in SQL. If the same raw data is being used by multiple applications, stage it for the applications in a QVD. When loading from a QVD, the load will go MUCH faster if you restrict yourself to a single WHERE EXISTS(field) as your condition, and only load the fields provided without any manipulation. If you require additional conditions and field manipulation, it can often be faster to apply them with an INNER JOIN after the fact, though that takes a lot more memory than not loading the data in the first place, so there are trade offs. You can often move complicated expressions or parts of complicated expressions from your charts to your script. That way, they only happen once per load instead of every time someone clicks on anything and looks at the chart.

             


            hesten wrote:- What is the best way to go about Security, especially multi-tiered?

            No idea. I'm at best peripherally involved in security, such as defining to which users I want to distribute a report. So I'll leave this to someone else.

             


            hesten wrote:- Why do I get 'object exceeded allocated memory'?

            Too much data? Expressions too complicated? Satisfying a poor expression may require joining two huge tables together on the fly, which just isn't going to work. Probably lots of possible reasons.

             


            hesten wrote:I need answers and tips on how to develop, script and deploy the optimum QlikView solution.

            Tip: Always load your data at the lowest level of possible interest, and accumulate up from there. You might occasionally be driven to create some pre-summarized data for performance reasons, but it's really opening up a can of worms. I'd try almost anything else first.

             


            hesten wrote:Please do not refer me to the Reference Manual, because I have looked in there for Data Modelling tips, but there aren't any.

            I imagine that QlikView's official position is a chipper, "We can handle ANY data model! Do ANYTHING you want and it'll work because our product is so awesome!!!" So yeah, I wouldn't expect a lot of data modeling advice from them, though perhaps I'm wrong, and it's just not in the Reference Manual. So I guess my general advice would be to refresh your star and snowflake schema knowledge, and then to post to the forum for opinions on how best to model specific things and why.

              • Data Modelling

                Thanks for your suggestions and insights!

                I am familiar with Star Schemas, but if you do it as you would in a Data Wrehouse with more than one Fact Table, you create loops! That is a bit frustrating, but I am getting there.

                With the 'Exceeded Allocated Memory' error: It isn't consistent. It happens to different charts at different times... usually they work fine but now and then it just freaks out and gives me that error. The only solution that I have found is to clone the original one and then delete it. For some reason the clone is fine.

                Thanks again guys. I guess I haven't fully made the mind shift from Data Warehousing to QV Embarrassed

                  • Data Modelling

                    Hello hesten

                    Re the Exceeded Allocated Memory error: it happens to one of my applications every other day, After 2-3 days, the chart which worked previously suddenly cannot show anything anymore. I did the same thing you did and deleted the chart and put in a new chart which is exactly the same as the previous one. DUH....

                    I was told that I may be using too many checks or set analysis so I need to use flags and sum them up instead. This means changing the application. I am just confused as to why this could be happening if QlikView is supposed to handle fairly complex and dynamic selections in-memory.

                    Should I jazz up the in-memory RAM in the server OR increase allocated memory handling in the Server Console ?

                    Anyone in the forum has an opinion here?

                      • Data Modelling
                        John Witherspoon

                        I can see how set analysis might cause memory problems. My current understanding is that is uses the same set engine as normal selections (there's a thread where senior tech support states otherwise, but I later got an email from the same person saying that it basically DOES use the same engine). So if you use set analysis, I can see that QlikView would likely be keeping at least two sets active in memory - the set analysis set, and the set corresponding to the selections. Multiple sets might be handled in parallel or sequentially, and this could even depend on available memory. Flags and IF statements I could see using less memory because they process using the current set, and likely operate on one record at a time. So they probably tend to be slower, but I could see them using less memory. Most of this is just guesswork.

                        Assuming you're using a 64-bit server, I think it is definitely worth seeing what happens when you throw RAM at the problem. I would definitely try that before recoding everything, particularly since I suspect the recode will execute more slowly than the original code. Well, flags may improve performance a little if used with set analysis instead of with IFs or multiplication.

                          • Data Modelling

                            On behalf od the different certifications courses delivered by Qliktech, there is a lot of things to improve the speed of expressions ( using flags, avoid count features and try to replace them by sums, try to compare data with integers instead of strings,etc..).

                            Regarding the set analysis, it seems that their advise is... to test between set analysis and if...

                            As for the IF functions, set analysis seems to be more powerful with integers conditions.

                            Now, in my opinion, i think that set analysis is really more powerful than if functions. Especially when you want to summarize different kind of data in one table ( for example instead of doing something like IF (A and not B, Sum(xxxx),IF(B and not A),sum(YYY),sum(zzz)).

                            In this case ou just to have to do an expression like sum({$<A=1>} xxxx) + sum({$<B=1>} YYYY) + sum({$<C=1>} zzzz), etc..

                            So it really depends on the complexity of your data.

                            Now regarding the preferred data model, qliktech advise a star or snowflakes shema in their certification courses.

                            But this is more to avoid synthetic keys. So if your model is quite simple and you don't have the need to create complex links, then, why don't use a classic model ?

                             

                            Finally regarding the error message on the table it can be a lot of things, memory exceeded, maybe, errors in expression, soometimes also, it happens when you expand a node in a pivot table that contains a blank or a null value. More memory you have on your server, easier it wil be for you. In some cases (in some applications i mean) memory grw up and fast, it is important to keep an eye on it specially during development and test of the application.

                            I hope this will be useful for you.

                            Rgds,

                            Sébastien

                             

                              • Data Modelling

                                Hi John and Spastor

                                Thank you for your thoughts. I like set analysis but sometimes it takes so much brain power that I use IF....THEN statements. Re my applications, I try to keep them simple with one or two tables and a calendar. The Allocated Memory Exceeded problem I have tried to solve by bumping up the allowed memory in the Advanced tab in the Server Mgt Console to 2048 and am crossing my fingers from here on.