11 Replies Latest reply: Aug 20, 2015 2:48 AM by Marcus Sommer RSS

    Problem with load script

      I have such code in script as is below (I've changed names and remove some dimensions). When I'm loading data as it is in the script - everything is fine. But this script it is not efficient on the large data (there are many syncs key so loading is very slow), so I wanted to add some keys or use "Concatenate". If I add keys then some values doesn't show on some dimensions:

      my keys according to used dimensions in table:

      CAST(Date AS varchar)+'|'+Product_id+'|'+Client_id AS [%KP_key]

      or

      CAST(Date AS varchar)+'||'+Client_id AS [%KP_key]

      or

      CAST(Date AS varchar)+'|'+Product_id+'|' AS [%KP_key]

      and then I'm creating link table and I'm removing Client_id and Product_id fields.

      When I add this and I want see what is Sum_c on Products, there are just zeros and after products there is a row with "-" id and whole sum. How it should look like -> Sum_c value should be repeated on every product.

       

      Any ideas how can I do that or how to optimize my data model?

       

      SumsC:
      SELECT
        TIME Date,
        ID Client_id,
        Value Sum_c
      FROM Sums
      where ID in (select ID from Clients);
      
      SumsP:
      SELECT
        TIME Date,
        ID Product_id,
        Value Sum_p
      FROM Sums
      where ID in (select ID from Products);
      
      Clients_data:
      SELECT
        TIME AS Date,
        sth1 Sth1_id, //some dimensions
        sth2 Sth2_id,
        sth3 Sth3_id, //there are a few more - 8
        Value Client_cost,
        ID Client_id
      FROM
      Clients_data;
      
      
      Products_data:
      SELECT
        TIME AS Date,
        sth1 Sth1_id, //there are also 8 more dimensions as in Clients_data
        sth2 Sth2_id,
        sth3 Sth3_id,
        Value Product_cost,
        ID Product_id
      FROM
      Products_data;
      
      
      Incomes:
      SELECT
        TIME AS Date,
        ID Client_id,
        ID Product_id,
        Value Income,
        Amount Amount,
        Client_counter,
        Product_counter
      FROM
      Incomes;
      
      
      

       

       

      ======

      Updates:
      I have some progress. I found similar problem here: database - Handling multiple fact tables in Qlikview - Stack Overflow. So:
      a) I did temp tables with whole data from database.

      Temp_SumsC:
      SELECT
        TIME Date,
        ID Client_id,
        Value Sum_c
      FROM Sums
      where ID in (select ID from Clients);
      
      ...
      
      


      b) I loaded temp tables to right tables using key

      SumsC:
      Load Value,
           Date & '|' & Client_id AS %C_key
      Resident Temp_SumsC
      
      ...
      
      

       

      c) I added data from all tables to TempLinkTable

      TempLinkTable:
      LOAD Distinct
           Date, Client_id 
      Resident Temp_SumsC
      
      Concatenate(TempLinkTable)
      ...
      
      

       

      d) I created LinkTable with all keys and dimension fields

      LinkTable:
      LOAD Distinct
           Date, Client_id , Product_id,
           Sth1_id, Sth2_id, Sth3_id,
           Date & '|' & Client_id %C_key, 
           Date & '|' & Product_id %P_key,      
           Date & '|' & Client_id & '|' & Sth1_id & '|' & Sth2_id & '|' & Sth3_id %CA_key, 
           Date & '|' & Product_id & '|' & Sth1_id & '|' & Sth2_id & '|' & Sth3_id %PA_key, 
           Date & '|' & Client_id & '|' & Product_id %CP_key
      Resident TempLinkTable
      
      

       

      e) I removed all unnecessary tables

      DROP Table Temp_SumsC;
      ...
      
      

       

      After script reloaded, the Sum_c and Sum_p started work, but Client_cost and Product_cost still doesn't work.

        • Re: Problem with load script
          Marcus Sommer

          You need to avoid any synthetic keys (unless you knows excactly what you are doing) and for this you might need to build composite keys and/or create a link-table and/or put tables per concatenating/joining/mapping together. Here you will find a collection of links how do you could change your script: Get started with developing qlik datamodels.

           

          - Marcus

          • Re: Problem with load script
            Marcus Sommer

            It' difficult to say what could be wrong - maybe there is inspite of the linktable no or not the right association between the tables to show or calculate the correct results. You could build a small tablebox with only a few fields around Client_cost and Product_cost with a date-, categorie-field or something similar and if this didn't returned "normal" data-rows something with the association don't worked properly.

             

            Maybe your approach to put everything in the linktable is not completely suitable (and often not the easiest way) and a mix from different approaches from join/map tableparts together, adds other tables per concatenate (unlike sql asymetric tables are not a problem - it worked great) and use linktables (by larger datasets it's useful to use an autonumber(ForTheCompositeKey) - the performance will be better) for the rest which isn't covered by the other approaches.

             

            - Marcus

            • Re: Problem with load script
              Marcus Sommer

              For me it looked if there aren't any changes within this tread - updates from postings are (mostly) not helpful, please use the answer-button.

               

              - Marcus

                • Re: Problem with load script

                  Thanks for answer!
                  I'll look at your link - I see that there are a lots of interesting things, but I think I tried everything from there and it didn't solve my problem to the end.

                  What I found out so far - when I remove all other sthX_id fields it works... Maybe because Client_cost and Product_cost are connected with these dimensions. So now I am trying to separate these fileds between Clients_data and Product_data - they should be connected with dimension definition, but not between fact tables. I know that for this I could use QUALIFY, but I'm not quite sure how to use it. Of course when dimension definition will have only id and name, then I'll use mapping to get dimension name.

                  Another overcome, which I found is to use QUALIFY and when I have two the same dimensions I use trigger to change this second when first is changed.


                  PS. I've edited my main post, because I couldn't add any reply for your answers.

                    • Re: Problem with load script
                      Marcus Sommer

                      I think that your datamodel isn't suitable yet - that there are still missing associations and/or a too complex structure and qualifying will be probably not helpful rather the opposite. Qualifying is only useful to separate data-areas from each other which is most not the aim within qlikview else the opposite again the association from (all) data is the aim. People with a sql-background where qualifying and normalization is quite common think often too complicated by building a datamodel in qlikview, many things (not all) a quite easy .

                       

                      The easiest data-model is the star-sheme with one fact-table (if you have several fact-tables they will be joined/mapped/concatenated) and various dimension-tables which are direct linked to the fact-table (chained dimension-tables will be de-normalized by joining/mapping/concatenating the tables together). Thes description is a bit black and white colored and the reality is more grey but it should give you a direction and most often other datamodels needs more efforts to build them.

                       

                      I suggest that you start with a simplified approach by not loading all tables and all fields and checked then within tableviewer and with one or two simple table-charts if the loaded data are like you expect them and then you could increase the complexity again step by step.

                       

                      - Marcus

                        • Re: Problem with load script

                          The point is that once there was one big table with all measures and dimensions, but it was splitted to increase performance of reports using SSAS (SQL Server Analysis Services), because many informations were duplicated in this one table.

                          Now I think that I should use this splitted table in QV - but maybe here I'm wrong? Does QlikView really like big tables with milions of records? I'm not quite sure what I should thinking about it.

                           

                          You said about data-model which is the star-scheme. I have now such scheme, but the main table is not fact-table but link-table with connections products-clients-data. And something like this is not too good, right?

                            • Re: Problem with load script
                              Marcus Sommer

                              Quite often worked such big table very well in qlikview. It couldn't be compared with big tables in normal sql-databases then qlikview stored only the distinct values of a field and used then bit-stuffed pointer, see: Symbol Tables and Bit-Stuffed Pointers.

                               

                              Nevertheless it will depend on various factors especially in which area will you have more likely a bottleneck which model-approach will fit best - see here what is meant: TableView Model - Star Or Snowflake - Does It Matter?.

                               

                              Therefore I suggest you start with a simple model which worked and only there are bigger performance problems or you have enough time you begin to optimize this and that.

                               

                              - Marcus

                                • Re: Problem with load script

                                  I followed your advice and I started from loading two problematic tables. Then I saw that already at this point I had additional row with whole sum (what I didn't want). When I added tables Sum_c and Sum_p (there were $syn keys) - everything was fine again...

                                  I saw that table with all $syn keys was like full outer join of tables, whereas I just could do Concatenate or plain joins.

                                   

                                  Is it possible to do full outer join of my tables (clients_data and products_data)?

                                  *of course without numeric values, only dimensions to do Link Table

                                    • Re: Problem with load script
                                      Marcus Sommer

                                      It's quite hard stuff here Should We Stop Worrying and Love the Synthetic Key? and I agree synthetic keys aren't a general problem. From them I recommend the comment from Michael Solomovich:

                                       

                                      My two cents (no time for much more...)

                                      If you are a beginner, the recomandation stays true - avoid synthetic keys. As a rule it's a result of poor design.
                                      If you're an expert, it's a different story. You know the risks and understand what you're doing.
                                      (Compare to car driving, you may by chance run on two left wheels, but better leave it for the experts :-))

                                       

                                      - Marcus

                                        • Re: Problem with load script

                                          I solved this problem in the following way: I've changed subsection c). Instead of this:

                                          
                                          TempLinkTable: 
                                          LOAD Distinct 
                                              Date, Client_id 
                                          Resident Temp_SumsC 
                                          
                                          Concatenate(TempLinkTable) 
                                          ...
                                          
                                          
                                          
                                          
                                          
                                          

                                           

                                          I used this:

                                          
                                          TempLinkTableP:
                                          LOAD Distinct      Date, Client_id, Product_id
                                          Resident           Incomes
                                          outer join (TempLinkTableP)
                                          LOAD Distinct      Date, Product_id, Sth1_id, Sth2_id, Sth3_id
                                          Resident           Products_data;
                                          
                                          NoConcatenate
                                          
                                          TempLinkTableC:
                                          LOAD Distinct      Date, Client_id, Product_id
                                          Resident           Incomes
                                          outer join (TempLinkTableC)
                                          LOAD Distinct      Date, Client_id, Sth1_id, Sth2_id, Sth3_id
                                          Resident           Clients_data;
                                          
                                          
                                          NoConcatenate
                                          
                                          TempLinkTable:
                                          LOAD Distinct      Date, Product_id, Client_id, Sth1_id, Sth2_id, Sth3_id
                                          Resident TempLinkTableP;
                                          Concatenate
                                          LOAD Distinct      Date, Product_id, Client_id, Sth1_id, Sth2_id, Sth3_id
                                          Resident TempLinkTableC;
                                          
                                          
                                          Drop Table TempLinkTableC;
                                          Drop Table TempLinkTableP;
                                          
                                          
                                          
                                          
                                          
                                          


                                          And it works - the values are correct. Unfortunately, instead of fast loading data, the app still works slow, but this problem is for another question.


                                          Thanks for all help Marcus.