Solved: Re: Best practice: Joining without key in spesific... - Qlik Community

Anonymous · ‎2016-10-28

Hello all, I will cut to the chase.

The goal is, given the tables:
CUSTOMERS
CustomerID
ProjectID

PROJECTS
ProjectID

ProjectNames

I wish to get a best practice (efficient) to add

SOME_PROJECTS

CustomerIDs

ProjectNames

to the PROJECTS table.

I am depending on an efficient way of associating my customers with the ProjectNames, and they should only exist in my PROJECTS table.

thank you for your contributions

best, Gard

johnw · ‎2016-10-31

Yes, I misunderstood what you were after. Hopefully I understand it now. Attached are a couple approaches. One assumes that all the data already exists and that we can't modify any of those loads. The other is less generic and requires other script changes, but might work for your case, and if so, I suspect it might be more efficient than the first.

View solution in original post

johnw · ‎2016-10-28

It seems to me we do have a key, and that we don't want to join. Most efficient may depend on where we're loading this data from. But a common approach might be to not load PROJECTS as a normal table, but instead as a mapping table. And to not load the CUSTOMERS table at all, just go straight to SOME_PROJECTS.

PROJECTS:
MAPPING LOAD
ProjectID
,ProjectName
FROM wherever
;

MAP ProjectName USING PROJECTS;

SOME_PROJECTS:
LOAD
CustomerID
,ProjectID as ProjectName
FROM somewhere
;

However, if your sources are QVDs, that can change things, because mapping will not allow us to do an optimized load. With QVDs, and assuming other requirements didn't make the load unoptimized, I'd probably do it more like this.

SOME_PROJECTS:
LOAD
CustomerID
,ProjectID
FROM some.qvd (QVD)
;
LEFT JOIN (SOME_PROJECTS) // or maybe INNER
LOAD
ProjectID
,ProjectName
FROM project.qvd (QVD)
WHERE exists(ProjectID)
;
DROP FIELD ProjectID
;

Anonymous · ‎2016-10-31

Hello John Witherspoon, thanks for your reply, your second option is more relevant to my problem as the tables PROJECTS and CUSTOMERS are very large and loaded earlier in the script. But I believe your interpretation of the question is not matching my intention, correct me if I'm wrong.

The table SOME_PROJECTS does not have the ProjectID, but rather has the CustomerID field, the ProjectID field exists in the CUSTOMERS and PROJECTS tables

Example initially I have

PROJECTS:

projectID_1, projectName_1
projectID_2, projectName_2

CUSTOMERS:

customerID_1, projectID_1,

customerID_2, projectID_2,

customerID_3, projectID_3

customerID_4, projectID_4,

SOME_PROJECTS:
customerID_3, projectName_3
customerID_4, projectName_4

----------------------- I want to end up with is

PROJECTS:

projectID_1, projectName_1
projectID_2, projectName_2

projectID_3. projectName_3

projectID_4, projectName_4

and CUSTOMERS unchanged.
note that all tables contain many additional fields and can't be mapping tables, in addition to being loaded from QVD files.

The problem is thus how to replace the customerIDs with the projectIDs when adding SOME_PROJECTS to the PROJECTS table in an efficient way.

johnw · ‎2016-10-31

Yes, I misunderstood what you were after. Hopefully I understand it now. Attached are a couple approaches. One assumes that all the data already exists and that we can't modify any of those loads. The other is less generic and requires other script changes, but might work for your case, and if so, I suspect it might be more efficient than the first.

Anonymous · ‎2018-04-05

Hi John, I realise this is way overdue but thanks for your reply 😃

I definitely favor the second approach as i generally find maps to be tidy and efficient.
The "where not exist" clause is also a very nice touch.

best /G

Best practice: Joining without key in spesific table