4 Replies Latest reply: Nov 22, 2011 11:28 AM by Joe Kirwan RSS

    LEFT JOIN is duplicating records!

    Joe Kirwan

      Here is my script

       

      LEFT JOIN (Trans)

      LOAD RouteCode,

      TargTime

      Resident TripMatrix;

       

      LEFT JOIN (Trans)

      LOAD

      TargTime,

      TIME_m,

      [TargTime]-([TIME_m]*24*60)as Delta

      Resident Trans;    

       

      Trans and Tripmatrix are previously loaded tables.

       

      In the first part, I am adding the TargTime value stored in TripMatrix to the records in Trans.

       

      The second part seems to be where my problem arises.

       

      I am trying to combine the values of TargTime (as loaded in the first part) to the value of TIME_m (previoulsy loaded).

       

      BUT - this seems to load multiples of the records in Trans.

       

      This is apparent in 2 ways - the $Rows count is a multiple of what it was previously; also, count functions now reflect the increased number of records.

       

      HOWEVER, when I do a simple table of the records in the Trans table, it only shows the original number of records.

       

      I think I have made an error in my JOIN.

       

      Can anyone help?

       

       

        • LEFT JOIN is duplicating records!

          Hi There,

           

          LEFT JOIN (Trans)

          LOAD RouteCode,

          TargTime

          Resident TripMatrix;

           

          Check your no. of rows after this step. Then try something like this instead:

           

          Test:

          load

          *,

          TargetTime-(ActualTime*24*60) as Delta;

          load

          RouteCode,

          TargTime as TargetTime,

          TIME_m as ActualTime

          resident Trans;

            • LEFT JOIN is duplicating records!
              Joe Kirwan

              Hi

               

              Thank you for your response.

               

              I tried your suggestion and the number of records is now correct as per the $Rows measure.

               

              However, there does seem to be some strange behaviour.

               

              When I now build a table box showing the records for Trans, it now shows multiple records where previoulsy there was only one! So the behaviour seems to have reversed.

               

              The problem seems to be caused by teh ActualTime fiedl ie in the Table Box it shows every possible ActualTime value for each record, rather than the ActualTime for that record.

               

              It would be of great help to me if you could explian your solution? I do not have a lot of experience with scripting. For instance, I do not understand why you are creating new fields for TargetTime and ActualTime? Or how you can load the formula for Delta using these field names when they have not already been identified in the script? Is there something I do not understand about the order of the script?

               

              Regards and thanks for your help.

               

              Joe

               

              PS - I added  DISTINCT to the LOAD in the second JOIN in my script and it seems to work (sort of). Does that give you any clues?

                • LEFT JOIN is duplicating records!

                  Hi Joe,

                   

                  I will need to look into the data to understand this further. Before you do the left join to your Trans table, I am assuming you have RouteCode in your Trans table and this has unique values? Please confirm.

                   

                  Cheers,

                  Shyam

                    • LEFT JOIN is duplicating records!
                      Joe Kirwan

                      Hi Shyam

                       

                      Yes, I have created the RouteCode field in the Trans table before attempting the joins.

                       

                      However, the RouteCode field values are not unique ie there are multiple instances of the same RouteCode for different records in the Trans table.

                       

                      What I am trying to do in the Trans table is add the appropriate TargTime for each RouteCode, which is stored in the TripMatrix table.

                       

                      In the second Left Join, I am tring to calculate the difference between the TargTime (which I assume is now present for each record in the Trans table) and the actual time (TIME_m) which is calculated during the original load of the Trans table.

                       

                      Any suggestions?

                       

                      Regards

                       

                      Joe