17 Replies Latest reply: Aug 15, 2011 8:28 AM by Nicolas ALLANO RSS

    Common QVDs

    aduncan

      We are trying to have a set of common dimensional QVDs that any application can use rather then having each app pull product hierarchy from the database. This is causing us to have large "star" schema applications. Some of our larger Qlikview applications (5-10GB QVWs) perform very slowly and I think if I could get rid of the star table schema it would really help.

       

      To solve this the fact table must contain all the dimensions but in order to do that they must be joined by the database. Does anyone know how to join QVDs so we can continue to use our common dimension process? My end problem is the performance issue so if I'm tackling this the wrong way please let me know. We have used Qlikview optimizer to eliminate any large columns we can.

        • Common QVDs
          Oleg Troyansky

          Adam,

          having a set of common QVD files is the right approach - it will shorten your data load time and make the whole process more organized. I don't understand why is it causing you to have large "star schema" applications. Perhaps you always load the whole QVD file into the QVW, which is probably a mistake.

          Star Schema by itself should not be too bad, unless it's overly normalized (like most Data Warehouses do).

          To avoid loading full tables, you can use function exists() and only limit the Dimensions to the needed ones. For example:

           


          Transactions:
          load
          ItemNumber,
          CustomerNumber,
          ...
          from Transactions.qvd (qvd);
          Items:
          load
          ItemNumber,
          ItemDescription
          from ItemMaster.qvd (qvd)
          where exists (ItemNumber)
          ;
          load
          CustomerNumber,
          CustomerName
          from CustomerMaster.qvd (qvd)
          where exists (CustomerNumber)
          ;


           

          In the example above, only those Items and only those Customers that exist in the Transactions table, will be loaded. Notice that using exists() does not cause any problems with the optimized QVD load - the load remains optimized (as opposed to using other where conditions).

          If I'm missing your problem, please extend, and I'll try to help.

          Oleg

            • Common QVDs
              aduncan

              Oleg, thanks for this suggestion. I understand the "where exists(key)" will limit the amount of data but that isn't my current performance issue as we are in testing and the dimensions only contain the applicable data (great tip for as we roll this out though).

               

              It was my understanding that having a lot of linked tables in the final application can have negative performance vs having all data in one table. I suppose we are overly normalized since we have only one fact table and many linked dimensions.

               

              Do you believe this could be our performance issue or should I be focusing my time elsewhere? When I look at the server task manager it seems a lot of the selections and graphs are running under one processor.

               

              thank you very much.

                • Common QVDs
                  Jay Jakosky

                  Yes, performance slows down when aggregating data in one table based on another table. With 5-10GB QVWs it's certainly a good idea to look at creative denormalization. There are several other optimizations that can have dramatic impacts on memory and performance.

                  Chart calculations limited to one processor can be a side effect of a "data island" for example.

                  I'm curious to see your Table Viewer graph and an example of a slow formula, for an uploaded QVW example.

                  • Common QVDs
                    Oleg Troyansky

                    Adam,

                    generally speaking, having many links that need to be resolved in the run-time may cause some latency, but I don't think that having a single Fact table and several "satellite" dimension tables would make such a dramatic impact.

                    If you are seeing that the action is single threaded - could it be that you have some VBScript macros ? Using Macros often is causing QlikVIew to go into a single-threaded mode.

                    You might have a single object/chart that's demanding a lot of resources. Open "Document Properties" right after loading a heavy screen, look at "Sheets" and see in the bottom what objects are using most memory and time to load.

                    You might have some fields with enormous amount of distinct values. Open Document Properties, Tables and look for a field with the most unique values. If you find a "monster" field, think what could you do with it. Best case - if you don't need it at all (For example, many times programmers load Transaction ID, just because they are used to it from their experience with transactional systems. For analytics, Transaction ID is useless (unless it's a link key). In any case, you need to find alternative solutions for the "monster" fields - either eliminate them or chop them into pieces or do something else.

                    You could also collect "Memory Statistics" and analyze them, to get a better idea about "what is eating up resources"...

                    Hard to say without looking at your application. If you are "stuck", engaging an experienced consultant is not a bad idea...

                    good luck,

                    Oleg

                     

                      • Common QVDs
                        John Witherspoon

                        I may or may not be understanding the question, so I'm going to drop back to basics.

                        Let's say you have a sales table. The sales table has a sales ID, a product ID, a customer ID, a sales date, and a sales quantity. The product table has a product ID, product description, and other product information. The customer table has the customer ID, customer name, delivery location ID, and other customer information. The delivery location table has a location ID, an address, and so on. (edit: I suppose that's technically a snowflake schema rather than a star schema, but I don't think it affects anything I'll be discussing.)

                        I would indeed make separate QVDs here.

                        If I understand your question, that's what you have right now - separate QVDs for all of your dimensional tables (customer, product, etc.). And if I understand correctly, you're reading these QVDs into separate tables in the QVW, and you're suspicious that that's causing performance problems.

                        I do have my doubts that that's the cause of your performance problems. I haven't done benchmarks, so this is very wishy-washy anecdotal evidence, but I've seen no such problems when using highly normalized tables like this. I also theorize that QlikView internally handles highly normalized and highly denormalized data very much the same way due to its internal compression algorithms.

                        That said, there's also nothing wrong with joining all of these tables together into one master table in the QVW, and that's an approach I might typically take for a case like this. It is certainly worth trying, and might well improve your performance. If so, I'd be curious to hear more about it.

                        The approach I would probably use is a series of "left join load" statements to join them all together. Load the sales table, left join load the product table, left join load the customer table, and left join load the location table. The joins do not and I'd suggest should not be done in the database in this case, but only in the QVW script.

                        I'm worried that I may have misunderstood the question, though.

                          • Common QVDs
                            Oleg Troyansky

                            John,

                             

                            let me disagree with you on a couple of points...

                            First, I've seen a number of documented evidences of significant performance improvement achieved by "de-normalization". Generally speaking, it's pretty safe to say that "denormalized structures generally work faster in QlikView than highly normalized ones". Thee might be exceptions, but generally speaking, this is true...

                            Second, your suggestions to "left join"... From the initial description of the problem, it's quite clear that the data volumes are huge (and hence the concern about performance). Left Join is a very "expensive" operation, and it can extend the load time by a lot...

                            For the typical Dimension Table with a single Code and a single Description, I'd recommend using Mapping and not Left Join (see my Wiki about it). Left join could be possibly used when needed for those tables with much more than 1-2 fields. However, I would only de-normalize those tables as the "last resort", because the disadvantages of losing clarity in your data model can be worse than the performance gain.

                            A lot of it, however, is a matter of "personal preference" - except for truly huge data sets, the differences might be very saddle...

                            cheers,

                            Oleg

                              • Common QVDs
                                John Witherspoon

                                 

                                Well, let me agree with your disagreements then.

                                On the performance differences or similarities, I was definitely theorizing. And I'll definitely accept documented evidence over personal theorycraft, and accept that the denormalized tables are faster. I'll keep that in mind for the future.

                                 

                                As for the left joins, I was assuming that the star schema existed to bring in a lot more than mere descriptions for IDs, such as in the example I gave. I agree that using a map is superior and faster for descriptions. I agree that the left joins on large tables such as this are very expensive, but the expense is on the load side. I was assuming that the major performance problem we wanted to solve was on the reporting side, and that incurring some heavy load expense might be acceptable to improve the reporting performance. I definitely should have stated all that up front, though, rather than let Adam be surprised when suddenly his load took five times as long to run.

                                 

                                 

                                And actually, if the dimension tables really ARE nothing but IDs and descriptions, I'd probably change my initial recommendation to leave them as separate QVDs. I'd probably instead suggest just joining them during the database load, and making a single QVD. It just doesn't seem worth the headache of maintaining a lot of separate QVDs for data THAT simple. That said, we have quite a few QVDs in our shop that are that simple because the uderlying tables have a lot more fields, and we just haven't needed those additional fields in QlikView yet.

                                (edit: I suppose one could create multiple maps for the tables, one for each field to be loaded. Then you'd do a series of something like applymap('Customer Name Map',"Customer ID") as "Customer Name", applymap('Customer Delivery Location Map','Customer ID') as "Customer Delivery Location" and so on. That might well be faster, if a bit more complex. There might be a better way. I've never done anything like that.)



                      • Common QVDs
                        aduncan

                        Is it possible to join QVDs so the QVW sees one table? I'm hearing that the schema may not be my issue but I'm just curious.

                        I'm attaching the Schema for reference.

                         

                         

                          • Common QVDs
                            Jay Jakosky

                            It sounds like the big problem is the single-threaded calculation of charts.

                            There's no special way to join QVDs, just the basic LOAD...JOIN sequence.

                            Your schema is reasonable. How to improve performance? A pivot table that groups a few amounts by Month, State, and Department hits 4 tables and is going to be slower than just one table. It seems that Lifestyle and Seg could be "mapped" to the LOC_CLASS table without much trouble.

                            I've been in a circumstance where it made sense to join the date fields to the fact table to reduce the number of tables involved in grouping. It might seem wasteful, but the nature of Calendar tables is that the fields are contiguous integers. QlikView stores these fields in a unique way that will improve performance without any additional memory.

                              • Common QVDs
                                John Witherspoon

                                Yes, you can join the QVDs together into a single table in the QVW. The "left join load" and mapping loads that we were discussing are a couple ways to handle things. Here's an example combining the two approaches as I might do it in practice, though it depends on the sizes of the various tables. Apologies if I have any bugs.

                                 

                                 

                                 

                                [Segment Map]:

                                MAPPING LOAD

                                [Customer Segment ID]

                                ,[Customer Segment Description]

                                FROM Seg (QVD)

                                ;

                                [Segment Lifestyle Map]:

                                MAPPING LOAD

                                [Customer Segment ID]

                                ,[Customer Lifestyle ID]

                                FROM Seg (QVD)

                                ;

                                [Lifestyle Map]:

                                MAPPING LOAD *

                                FROM Lifestyle (QVD)

                                ;

                                // I probably got this part wrong:

                                MAP [Customer Segment ID] USING [Segment Map];

                                MAP [Customer Segment ID2] USING [Segment Lifestyle Map];

                                MAP [Customer Lifestyle ID] USING [Lifestyle Map];

                                 

                                [Main]:

                                LOAD *

                                ,[Customer Segment ID] AS [Customer Segment ID2]

                                FROM LOC_CLASS (QVD)

                                ;

                                LEFT JOIN LOAD *

                                FROM Class (QVD)

                                ;

                                LEFT JOIN LOAD *

                                FROM Date (QVD)

                                ;

                                LEFT JOIN LOAD *

                                FROM Loc (QVD)

                                ;

                                 

                                The left join tells it to join with the main LOC_CLASS table where there's something to join with. So the result is a single table with all of the fields in it. On the down side, this will take significantly more time than loading them without joining them. On the up side, reporting using the resulting table should be quicker. If Seg and Lifestyle are small tables, you might want could probably handle them both with mapping loads to avoid one of the left joins.

                                 

                                As for the part where I said I probably got it wrong, you just need to apply various maps in sequence. I'm not sure if I have the right syntax, or if QlikView is smart enough to go in sequence (mapping a segment ID to a lifestyle ID, then the lifestyle ID to a lifestyle description, all in one load). If that doesn't work, I'm sure you can get more explicit with the applymap() function. It's one of those things that I'm confident can be done, but would have to experiment to get it exactly right. All of that is just to save time on the load. If the load time isn't important, you can just left join every table after the first. And as I said earlier, it all depends on the size of your tables. I'm assuming your Seg table is smallish, for instance, as I load it twice to produce two of the maps.

                                 



                                  • Common QVDs
                                    Rob Wunderlich

                                     


                                    John Witherspoon wrote:Apologies if I have any bugs


                                    Beware the very subtle limitation on using QVDs as input to MAPPING loads. Optimized QVD loads, which are usually desirable, will not work as a mapping table. You will receive no error or warning. For example, this will not map correctly:

                                     

                                    [Lifestyle Map]:
                                    MAPPING LOAD *
                                    FROM Lifestyle.qvd (QVD);


                                    The solution is to add something (like WHERE 1=1) that forces an unoptimized load.

                                     

                                     

                                    [Lifestyle Map]:
                                    MAPPING LOAD *
                                    FROM Lifestyle.qvd (QVD)
                                    WHERE 1=1;


                                    This one can drive you crazy if you don't know about it. Attached is a sample that demonstrates the problem and the solution.

                                    -Rob



                                      • Common QVDs
                                        Oleg Troyansky

                                        I would be curious to know the counts of the rows for each of those tables. Also, I'm almost sure that some of the attribute fields out of "Locations" might not be needed.

                                        From what I can see, I wouldn't say that the schema is the reason for a slow performance - probably the design of the objects on the screen. Three biggest things to watch for:

                                        1. Overly detailed pivot tables.

                                        2. Numerous small objects - text objects, little gauges and charts. Many small objects on the screen usually require more time and memory than a single chart.

                                        3. Heavy calculations with IF functions or other "heavy" functions inside.

                                        Oleg

                                        • Common QVDs
                                          Jay Jakosky

                                          Nice tip Rob.

                                          • Re: Common QVDs

                                            Hi Rob,

                                             

                                            I took note of your recommendations regarding the Optimized Load for mappings. I think they have fixed the problem with recent versions (I am using QV 10 SR2). If you re-run the script that you have posted, it will work for now.

                                             

                                            However, there is still a problem (even worse for me; I don't know if it is a new one), but you can't apply map on a table that you have loaded from a qvd (when the loading is optimized).

                                             

                                            Rob, I have played with your sample qvw file, and modify it to demonstrate the problem.

                                            Anyone would have an idea?

                                             

                                            Thanks in advance,

                                              • Common QVDs
                                                Rob Wunderlich

                                                Yes, I recently discovered (the hard way) that MAP USING will not work with an optimized load. The workaround is to force an unoptimized load or use ApplyMap() in the LOAD -- which forces un-optimized anyways. I'm going to submit it as a bug and see it gets any love.

                                                 

                                                -Rob

                                                  • Common QVDs

                                                    Hi Rob,

                                                     

                                                    I have created a new case through the Customer Portal. Will update this thread asap,

                                                      • Re: Common QVDs

                                                        Hi,

                                                         

                                                        Here is the answer from QV Support:

                                                         

                                                        This is according to design. Mapping features require unoptimized load to be applied, since the optimized load is put straight into memory without any processing.

                                                         

                                                        In this case the easy work around is to either add the dummy WHERE 1=1 to trigger unoptimized load or add a preceeding load as then the map will be applied on the table while being reloaded. Find sample attached to demonstrate the two options to solve the issue.

                                                         

                                                        A bug (40908) has reported for the need to clarify this in the manual.

                                                         

                                                        Unfortunately, it does not sound like this limitation is going to be fix soon.