Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Best practice: Joining without key in spesific table

Hello all, I will cut to the chase.


The goal is, given the tables:
CUSTOMERS
     CustomerID
     ProjectID

PROJECTS
     ProjectID

     ProjectNames

I wish to get a best practice (efficient) to add

SOME_PROJECTS

     CustomerIDs

     ProjectNames

to the PROJECTS table.

I am depending on an efficient way of associating my customers with the ProjectNames, and they should only exist in my PROJECTS table.


thank you for your contributions

best, Gard

1 Solution

Accepted Solutions
johnw
Champion III
Champion III

Yes, I misunderstood what you were after. Hopefully I understand it now. Attached are a couple approaches. One assumes that all the data already exists and that we can't modify any of those loads. The other is less generic and requires other script changes, but might work for your case, and if so, I suspect it might be more efficient than the first.

View solution in original post

4 Replies
johnw
Champion III
Champion III

It seems to me we do have a key, and that we don't want to join. Most efficient may depend on where we're loading this data from. But a common approach might be to not load PROJECTS as a normal table, but instead as a mapping table. And to not load the CUSTOMERS table at all, just go straight to SOME_PROJECTS.

PROJECTS:
MAPPING LOAD
ProjectID
,ProjectName
FROM wherever
;

MAP ProjectName USING PROJECTS;

SOME_PROJECTS:
LOAD
CustomerID
,ProjectID as ProjectName
FROM somewhere
;

However, if your sources are QVDs, that can change things, because mapping will not allow us to do an optimized load. With QVDs, and assuming other requirements didn't make the load unoptimized, I'd probably do it more like this.

SOME_PROJECTS:
LOAD
CustomerID
,ProjectID
FROM some.qvd (QVD)
;
LEFT JOIN (SOME_PROJECTS) // or maybe INNER
LOAD
ProjectID
,ProjectName
FROM project.qvd (QVD)
WHERE exists(ProjectID)
;
DROP FIELD ProjectID
;

Anonymous
Not applicable
Author

Hello John Witherspoon, thanks for your reply, your second option is more relevant to my problem as the tables PROJECTS and CUSTOMERS are very large and loaded earlier in the script. But I believe your interpretation of the question is not matching my intention, correct me if I'm wrong.

The table SOME_PROJECTS does not have the ProjectID, but rather has the CustomerID field, the ProjectID field exists in the CUSTOMERS and PROJECTS tables


Example initially I have

PROJECTS:

projectID_1, projectName_1
projectID_2, projectName_2

CUSTOMERS:

customerID_1, projectID_1,

customerID_2, projectID_2,

customerID_3, projectID_3

customerID_4, projectID_4,

SOME_PROJECTS:
customerID_3, projectName_3
customerID_4, projectName_4

----------------------- I want to end up with is

PROJECTS:

projectID_1, projectName_1
projectID_2, projectName_2

projectID_3. projectName_3

projectID_4, projectName_4

and CUSTOMERS unchanged.
note that all tables contain many additional fields and can't be mapping tables, in addition to being loaded from QVD files.

The problem is thus how to replace the customerIDs with the projectIDs when adding SOME_PROJECTS to the PROJECTS table in an efficient way.

johnw
Champion III
Champion III

Yes, I misunderstood what you were after. Hopefully I understand it now. Attached are a couple approaches. One assumes that all the data already exists and that we can't modify any of those loads. The other is less generic and requires other script changes, but might work for your case, and if so, I suspect it might be more efficient than the first.

Anonymous
Not applicable
Author

Hi John, I realise this is way overdue but thanks for your reply 😃

I definitely favor the second approach as i generally find maps to be tidy and efficient.
The "where not exist" clause is also a very nice touch.

best /G