Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
smoon63
Partner - Creator
Partner - Creator

Setting a field value in Load based on substring from separate table

I have two data sources - one is a relational DB and one is an Excel sheet. Here are the key fields from each:

CourseCompletion (DB table):
Station (string),
CourseTitle (string);

SpecOpsCourseRqmts (Excel Table):
Station (string),
CourseCategory (string),
Required (boolean);

Out of a long list of stations, 4 of them have unique requirements for which course categories they must complete. The requirements for those four stations are contained in the Excel file. What I need to accomplish sounds simple but has so far eluded me. In short I need to perform a left join where all records from the DB table are included, and those from the Excel table where the Station field matches the DB table, and (here's where it gets tricky) where the CourseCategory field in Excel is a substring of the CourseTitle field in the DB table.

If the Excel file was a table in the database I could do something like:
LOAD
Station,
CourseTitle,
Required;
SQL SELECT
CourseCompletion.Station,
Course.CourseTitle,
if(isnull(SpecOpsCourseRqmts.Required), 0, SpecOpsCourseRqmts.Required) as Required
FROM
CourseCompletion LEFT JOIN SpecOpsCourseRqmts ON
(CourseCompletion.Station = SpecOpsCourseRqmts.Station
AND CourseCompletion.CourseTitle LIKE '%' & SpecOpsCourseRqmts.CourseCategory & '%';

Unfortunately they are not in the same database and I can't for the life of me figure out how to write the left join in the QV script.

Any help will be greatly appreciated!

Scott Moon

1 Solution

Accepted Solutions
johnw
Champion III
Champion III

There might be a simpler way, but I think this would work (untested):

// load your raw data

CourseCompletion:
LOAD
Station
,CourseTitle
...
FROM ...
;

SpecOpsCourseRqmts:
LOAD
Station
,CourseCategory
,Required
...
FROM ...
;

// then left join by station - creates too many rows

LEFT JOIN (SpecOpsCourseRqmts)
LOAD
Station
,CourseTitle
RESIDENT CourseCompletion
;

// then remove all rows that don't match your substring condition

INNER JOIN (SpecOpsCourseRqmts)
LOAD *
WHERE index(CourseTitle,SpecOpsCourseRqmts)
;

// then left join onto your main table
// joins by both Sation AND CourseTitle

LEFT JOIN (CourseCompletion)
LOAD *
RESIDENT SpecOpsCourseRqmts
;

// no longer need SpecOpsCourseRqmts

DROP TABLE SpecOpsCourseRqmts;

View solution in original post

4 Replies
johnw
Champion III
Champion III

There might be a simpler way, but I think this would work (untested):

// load your raw data

CourseCompletion:
LOAD
Station
,CourseTitle
...
FROM ...
;

SpecOpsCourseRqmts:
LOAD
Station
,CourseCategory
,Required
...
FROM ...
;

// then left join by station - creates too many rows

LEFT JOIN (SpecOpsCourseRqmts)
LOAD
Station
,CourseTitle
RESIDENT CourseCompletion
;

// then remove all rows that don't match your substring condition

INNER JOIN (SpecOpsCourseRqmts)
LOAD *
WHERE index(CourseTitle,SpecOpsCourseRqmts)
;

// then left join onto your main table
// joins by both Sation AND CourseTitle

LEFT JOIN (CourseCompletion)
LOAD *
RESIDENT SpecOpsCourseRqmts
;

// no longer need SpecOpsCourseRqmts

DROP TABLE SpecOpsCourseRqmts;

smoon63
Partner - Creator
Partner - Creator
Author

Thanks for the reply, John.

Couple of questions

- Should this be set up as one continuous LOAD statement, or with a series of temp tables culminating in one master table?

- Since CourseCompletion is the main table, should the original LEFT JOIN go the other direction - i.e. left join SpecOpsCourseRqmts to CourseCompletion?

- In the INNER JOIN to remove the the non-matching rows, should there be a source named? This is just a piece in the middle of a long and complicated script - I'm afraid doing a "LOAD *" without a specific source might wreak havoc.

johnw
Champion III
Champion III

The reason I didn't do the first left join in the other direction is that the inner join as I wrote it would then have removed a bunch of rows from the main table that you still wanted. That said, it probably would have been simpler and used less memory to do what you say, and just make a slightly more complicated inner join get what you wanted. It would probably just need one more line. For some reason, I didn't think to do it that way.

With that fixed, then yes, I think it would just be a series of load statements all into the same master table, rather than using a temp table like I did. That would be better.

Yes, the inner join should have a resident source named, SpecOpsCourseRqmts. That was just a mistake. Though if you change the original left join, then the source would be CourseCompletion.

smoon63
Partner - Creator
Partner - Creator
Author

John,

Meant to reply last week and got sidetracked. Your first solution turned out to be the winner. I had done something wrong the first time around and couldn't track it down, so I scrapped it and started over. Second time through it worked perfectly in pretty much the form you sent.

Thanks for another good save!

Scott Moon

Austin Fire Department