5 Replies Latest reply: Sep 28, 2012 12:35 PM by Nate Hallquist RSS

    Joining tables with rules vs. relationships (ETL)

      Normally, we're going to join tables based on a field that exists in both tables, ala:


      Select T1.Field1, T2.Field2 ON T1.ID = T2.ID....


      What if no Primary Key / Foreign Key relationship exists and we don't have that "ID" field in both tables? Can QV relate rows based on some sort of rule which contains multiple conditions? Example, I want to define a "match" between two rows in different data sources based on the following condition:



      (DataSource2.Color = "Red" AND DataSource1.ColorCode = "1")


      (DataSource1.OrderDate between <an expression which returns todays date> and <an expression which returns two days ago>)


      (DataSource2.ShipDate = <some expression which returns today's date>)


      (DataSource1.Cost >= (DataSource2.Total / 2 ))


      (The example above is completely contrived, btw - I'm just trying to demonstrate a semi-complex ruleset that must evaluate to "true")




      1. Can QV actually associate records in this fashion


      2. Even if it can, is it best practice to do so, or should we use an industrial strength ETL tool to do this and just point QV a the resulting output (we can if necessary)



        • Re: Joining tables with rules vs. relationships (ETL)
          Nate Hallquist

          Well, I think you can, if I understand what you are trying to do.  QV Data works best in a 3 tier model, with a source layer of QVD's, a Transformed layer, and then the application.  The way that I have conquered this hill is to create concatenated keys.  At the 2nd layer of QVD's, I build Keys like Color|Date|Color.Code and name that column the same as the key you build in the other data table.  If you do this at Layer 2, you can use the Application to bring in the data, with the keys already made and they should just associate. 


          That is how I would do it, but I am sure there are other ways.  AS for the use of an ETL tool, that is up to you.  When I run accross this issue in my consulting, I use QV to do this, and that is what I advise.  However, is this a case where the data could/would be used by other platforms, maybe excel, SQL, or something?  If that is the case, it may make better business sense to use an ETL tool, do the transformations, store it in the DB, and use QV to pull the data after the ETL, right from the database. 


          Good luck and let me know if you have any further questions...

            • Re: Joining tables with rules vs. relationships (ETL)



              "Building" the keys had occured to me, too: Transforming strings (color="Red") to codes (Color=1) so I can do a "real" (associative) join later seems do-able - its the more one-off stuff like (DataSource1.Cost >= (DataSource2.Total / 2 )) that seems questionable. Example:





              Row1: Cost=10




              Row1: Total=20

              Row2: Total=30


              Row1 from the second data source should "join" to Row 1 of the first data source because 10 >= (20/2)

              However, Row2 doesn't fit this criteria: 10 is not >= to 30/2


              See how association sort of falls apart here? One needs to consider truthiness in an almost cursor-like fashion vs. just doing a batch join between two fields.

                • Re: Joining tables with rules vs. relationships (ETL)
                  Nate Hallquist

                  What data are you working with, "Big Data", so to speak, unstructured data?  Without really getting a good understanding as to why you would want to do this, I can't really come up with a great solution.  Building all of those flags in the script could be a Fool's Errand, I think.  I also know that QV is not the perfect tool for every situation.  Here, maybe a creative combo of QV scripting and a powerfull ETL tool would be best.  Depending on the requirement, maybe you could accomplish this within a chart too.  I can't really say without knowing more.  Regardless, it does look interesting and challenging.  What are the two sources of data?  If they are from the same DB, why aren't they related and how does the "system" handle that data?  I had a situation, where I had to mine a Transaction table twice, once for Header data and once for Line data.  There was a key that I could use, but didn't realize it.  The script for child/Line records went something like...


                  If the Primary|Key and Covert|Key were not equal, then use Covert|Key as Child|Key, and write the data to a QVD naming the Covert key as Child|Key.  Kind of a work around, but it worked great.  It allowed me to create the classic Header and Line tables and get the data into a traditional data model.


                  Good luck...