3 Replies Latest reply: Oct 31, 2016 1:03 PM by John Witherspoon RSS

    Best practice: Joining without key in spesific table

    Gard Hillestad

      Hello all, I will cut to the chase.


      The goal is, given the tables:
      CUSTOMERS
           CustomerID
           ProjectID

       

      PROJECTS
           ProjectID

           ProjectNames

       

      I wish to get a best practice (efficient) to add

       

      SOME_PROJECTS

           CustomerIDs

           ProjectNames

       

      to the PROJECTS table.

       

      I am depending on an efficient way of associating my customers with the ProjectNames, and they should only exist in my PROJECTS table.


      thank you for your contributions

      best, Gard

        • Re: Best practice: Joining without key in spesific table
          John Witherspoon

          It seems to me we do have a key, and that we don't want to join. Most efficient may depend on where we're loading this data from. But a common approach might be to not load PROJECTS as a normal table, but instead as a mapping table. And to not load the CUSTOMERS table at all, just go straight to SOME_PROJECTS.

           

          PROJECTS:
          MAPPING LOAD
          ProjectID
          ,ProjectName
          FROM wherever
          ;

          MAP ProjectName USING PROJECTS;

          SOME_PROJECTS:
          LOAD
          CustomerID
          ,ProjectID as ProjectName
          FROM somewhere
          ;

           

          However, if your sources are QVDs, that can change things, because mapping will not allow us to do an optimized load. With QVDs, and assuming other requirements didn't make the load unoptimized, I'd probably do it more like this.

           

          SOME_PROJECTS:
          LOAD
          CustomerID
          ,ProjectID
          FROM some.qvd (QVD)
          ;
          LEFT JOIN (SOME_PROJECTS) // or maybe INNER
          LOAD
          ProjectID
          ,ProjectName
          FROM project.qvd (QVD)
          WHERE exists(ProjectID)
          ;
          DROP FIELD ProjectID
          ;

            • Re: Best practice: Joining without key in spesific table
              Gard Hillestad

              Hello John Witherspoon, thanks for your reply, your second option is more relevant to my problem as the tables PROJECTS and CUSTOMERS are very large and loaded earlier in the script. But I believe your interpretation of the question is not matching my intention, correct me if I'm wrong.

               

              The table SOME_PROJECTS does not have the ProjectID, but rather has the CustomerID field, the ProjectID field exists in the CUSTOMERS and PROJECTS tables


              Example initially I have

               

              PROJECTS:

              projectID_1, projectName_1
              projectID_2, projectName_2

               

              CUSTOMERS:

              customerID_1, projectID_1,

              customerID_2, projectID_2,

              customerID_3, projectID_3

              customerID_4, projectID_4,

               

              SOME_PROJECTS:
              customerID_3, projectName_3
              customerID_4, projectName_4

               

              ----------------------- I want to end up with is

               

              PROJECTS:

              projectID_1, projectName_1
              projectID_2, projectName_2

              projectID_3. projectName_3

              projectID_4, projectName_4

               

              and CUSTOMERS unchanged.
              note that all tables contain many additional fields and can't be mapping tables, in addition to being loaded from QVD files.

               

              The problem is thus how to replace the customerIDs with the projectIDs when adding SOME_PROJECTS to the PROJECTS table in an efficient way.