11 Replies Latest reply: Nov 18, 2016 5:18 AM by Saffe Saff RSS

    Join takes its time

    Saffe Saff

      Hi All!

       

      Im trying to join these tables to get one fact table in order to compare different versions of a forecast. This is what I am trying to do:

       

      Facilities:13 000 rows

      LEFT JOIN

      Forecast: 780 000 rows

       

      However it takes ages (1 hours) and they share only one key Facility_ID. Ive got 32 GB ram.

       

      Can anyone help me understand whats going wrong?

        • Re: Join takes its time
          Vishwarath Nagaraju

          Can you share your script and the data model snapshot how it looks ?

            • Re: Join takes its time
              Saffe Saff

              Hi!

              It looks like this from the start. Im just making a simple join like this:

               

              Forecast_temp:

              LOAD

                   Facility_ID,

                   Föränd,

                   Vers,

              FROM

              Facility.QVD(qvd);

              LEFT JOIN (Forecast_temp)

              LOAD

                   Facility_ID,

                   Vers,

              FROM

              Forecast.QVD(qvd);

               

               

                • Re: Join takes its time
                  Vishwarath Nagaraju

                  What happens if you dont use joins? Did you try like below?

                   

                  Facility:

                  LOAD

                       Facility_ID,

                       Föränd,

                       Vers,

                  FROM

                  Facility.QVD(qvd);

                   

                  Forecast:

                  LOAD

                       Facility_ID,

                       Vers AS ForecastVers

                  FROM

                  Forecast.QVD(qvd);

                  • Re: Join takes its time
                    Staffan Johansson

                    You Join on two fields, FacilityID and Vers, do a composite key instead, FacilityID &'|'& Vers in both.

                    Forecast_temp:

                    LOAD

                         Facility_ID & '|' &Vers as %Key

                         Föränd

                       

                    FROM

                    Facility.QVD(qvd);

                    LEFT JOIN (Forecast_temp)

                    LOAD

                         Facility_ID & '|' &Vers as %Key

                    FROM

                    Forecast.QVD(qvd);

                • Re: Join takes its time
                  Staffan Johansson

                  Hi,

                   

                  why not doing it the other way around.

                  Forecast

                  left Join

                  Facilities

                  If you use few fields from Facilities, use applymap instead.

                    • Re: Join takes its time
                      Saffe Saff

                      Hi Staffan

                       

                      Thank you for you answer.

                       

                      I have tried the other way. And also removing several fields. It still takes a lot of time. I am removing all the fields that I use Apply mapp on.

                      If concatenate it goes really fast. However I want for each facility its forecasted values (each month).

                    • Re: Join takes its time
                      John Witherspoon

                      Is Facility_ID a unique key to Facility.qvd? You have 13000 unique facilities? If it were not a unique key, it could easily explain what you're seeing. Say you had 130 unique facilities, 100 rows for each, some data fields that are different, even if they aren't different for the data you're using. For instance, the facilities table has a row per facility per month, and you have 100 months of history in the table. Then you'd see 78,000,000 rows being generated, which might take ages to load, and you definitely wouldn't have what you wanted when it finished.

                       

                      I also agree with Staffan with joining in the other direction. I'd agree with applymap too, but if the data is as described, and that's a unique key, this should go so fast you probably wouldn't notice any difference.

                       

                      Also, I'm doubting the "they only share one key" bit when you have Vers being loaded from both QVDs.

                        • Re: Join takes its time
                          Saffe Saff

                          Hi John,

                           

                          No its not unique for the table. This is the tricky part, there is a unique set for each version of the data. So whenever the business adjusts the forecast values or the values for the facility we get a new version. However version and facility_id is unique in facility table and corresponds to 12 months of data in the forecast table, so 12 rows per facility per version.

                           

                          Yes version is the same but I have tried without version. I will try again. How can I apply mapp here?

                          • Re: Join takes its time
                            Saffe Saff

                            Hi again,

                             

                            So I created a composite key of Facility_ID and version and loaded distinct values from the facility and works just fine now. However I would like your opinions about how to approach the rest, Apply map and only keep IDs?

                              • Re: Join takes its time
                                Peter Cammaert

                                Well, is there a specific reason why you want to JOIN those tables into one in the Load script? It doesn't seem to be a large number of rows and QlikView should be able to handle the Join in memory with ease. After all, the Facilities table is just a Dimension table, just like you would treat a Customers or Products table...

                                 

                                And you can still compare forecast versions from the actual facts table...

                                  • Re: Join takes its time
                                    Saffe Saff

                                    Hi Peter,

                                     

                                    The reason is that the facility table only applies to the forecast. The history table has its own facility information. I am not sure about how to approach this best. My first attempt was to normalize the forecast and then concatenate it with the historical values (1 fact table) and then connect them to the dimensions.

                                     

                                    Thanks for your time.