Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Joining tables with rules vs. relationships (ETL)

Normally, we're going to join tables based on a field that exists in both tables, ala:

Select T1.Field1, T2.Field2 ON T1.ID = T2.ID....

What if no Primary Key / Foreign Key relationship exists and we don't have that "ID" field in both tables? Can QV relate rows based on some sort of rule which contains multiple conditions? Example, I want to define a "match" between two rows in different data sources based on the following condition:

(DataSource2.Color = "Red" AND DataSource1.ColorCode = "1")

AND

(DataSource1.OrderDate between <an expression which returns todays date> and <an expression which returns two days ago>)

AND

(DataSource2.ShipDate = <some expression which returns today's date>)

AND

(DataSource1.Cost >= (DataSource2.Total / 2 ))

(The example above is completely contrived, btw - I'm just trying to demonstrate a semi-complex ruleset that must evaluate to "true")

Questions:

1. Can QV actually associate records in this fashion

2. Even if it can, is it best practice to do so, or should we use an industrial strength ETL tool to do this and just point QV a the resulting output (we can if necessary)

Thanks!

1 Solution

Accepted Solutions
hallquist_nate
Partner - Creator III
Partner - Creator III

What data are you working with, "Big Data", so to speak, unstructured data?  Without really getting a good understanding as to why you would want to do this, I can't really come up with a great solution.  Building all of those flags in the script could be a Fool's Errand, I think.  I also know that QV is not the perfect tool for every situation.  Here, maybe a creative combo of QV scripting and a powerfull ETL tool would be best.  Depending on the requirement, maybe you could accomplish this within a chart too.  I can't really say without knowing more.  Regardless, it does look interesting and challenging.  What are the two sources of data?  If they are from the same DB, why aren't they related and how does the "system" handle that data?  I had a situation, where I had to mine a Transaction table twice, once for Header data and once for Line data.  There was a key that I could use, but didn't realize it.  The script for child/Line records went something like...

If the Primary|Key and Covert|Key were not equal, then use Covert|Key as Child|Key, and write the data to a QVD naming the Covert key as Child|Key.  Kind of a work around, but it worked great.  It allowed me to create the classic Header and Line tables and get the data into a traditional data model.

Good luck...

View solution in original post

5 Replies
hallquist_nate
Partner - Creator III
Partner - Creator III

Well, I think you can, if I understand what you are trying to do.  QV Data works best in a 3 tier model, with a source layer of QVD's, a Transformed layer, and then the application.  The way that I have conquered this hill is to create concatenated keys.  At the 2nd layer of QVD's, I build Keys like Color|Date|Color.Code and name that column the same as the key you build in the other data table.  If you do this at Layer 2, you can use the Application to bring in the data, with the keys already made and they should just associate. 

That is how I would do it, but I am sure there are other ways.  AS for the use of an ETL tool, that is up to you.  When I run accross this issue in my consulting, I use QV to do this, and that is what I advise.  However, is this a case where the data could/would be used by other platforms, maybe excel, SQL, or something?  If that is the case, it may make better business sense to use an ETL tool, do the transformations, store it in the DB, and use QV to pull the data after the ETL, right from the database. 

Good luck and let me know if you have any further questions...

Not applicable
Author

Thanks!

"Building" the keys had occured to me, too: Transforming strings (color="Red") to codes (Color=1) so I can do a "real" (associative) join later seems do-able - its the more one-off stuff like (DataSource1.Cost >= (DataSource2.Total / 2 )) that seems questionable. Example:

DataSource1:

Row1: Cost=10

DataSource2:

Row1: Total=20

Row2: Total=30

Row1 from the second data source should "join" to Row 1 of the first data source because 10 >= (20/2)

However, Row2 doesn't fit this criteria: 10 is not >= to 30/2

See how association sort of falls apart here? One needs to consider truthiness in an almost cursor-like fashion vs. just doing a batch join between two fields.

hallquist_nate
Partner - Creator III
Partner - Creator III

What data are you working with, "Big Data", so to speak, unstructured data?  Without really getting a good understanding as to why you would want to do this, I can't really come up with a great solution.  Building all of those flags in the script could be a Fool's Errand, I think.  I also know that QV is not the perfect tool for every situation.  Here, maybe a creative combo of QV scripting and a powerfull ETL tool would be best.  Depending on the requirement, maybe you could accomplish this within a chart too.  I can't really say without knowing more.  Regardless, it does look interesting and challenging.  What are the two sources of data?  If they are from the same DB, why aren't they related and how does the "system" handle that data?  I had a situation, where I had to mine a Transaction table twice, once for Header data and once for Line data.  There was a key that I could use, but didn't realize it.  The script for child/Line records went something like...

If the Primary|Key and Covert|Key were not equal, then use Covert|Key as Child|Key, and write the data to a QVD naming the Covert key as Child|Key.  Kind of a work around, but it worked great.  It allowed me to create the classic Header and Line tables and get the data into a traditional data model.

Good luck...

Not applicable
Author

We're going to have 3 (and sometimes more) data sources, depending on how the customer stores their data - generally each will be 100M+ rows, never more than 600-700M rows an in any single data source. 

  • CSV
  • SQL Server or MySQL
  • XML
hallquist_nate
Partner - Creator III
Partner - Creator III

The only other thing I can sthink of would be to somehow aggregate the data in your DataSource 2.  So net out the metrics where you need to, and then find a common key.  That would allow you to do some of the analysis you are looking for.  However, I think we have decided that the ultimate answer to your question is to do a creative combination of ETL tool and QV.  Do you have an ETL tool already deployed?  If not, choose wisely.  There are quite a few out there, and the functionality is vast. 

Good luck.