Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Outer Join Problems

Hello,

This is a follow-up question from: http://community.qlik.com/forums/t/33999.aspx

I realized after all that was said and done, I realized I needed another column in there. A here's a brief look at what it looked like originally:

Taxonomy Category DoS to DoOS DoOS to Do1NA Do1NA to Do1O ....etc

Cardiology 2 1 3 ....etc

Ophthalmology 1 3 6 ...etc

Orthopedics 6 8 9 ....etc

What it should've been when it was transformed:

Variables Days Taxonomy Category

DoS to DoOS 2 Cardiology

DoS to DoOS 1 Ophthalmology

DoS to DoOS 6 Orthopedics

DoOS to Do1NA 1 Cardiology

DoOS to Do1NA 3 Ophthalmology

DoOS to Do1NA 8 Orthopedics

Do1NA to Do1O 3 Cardiology

Do1NA to Do1O 6 Ophthalmology

Do1NA to Do1O 9 Orthopedics

Here's the code i've been tinkering with in an attempt to just add the column onto the first with it's ID key:

DaysTable:
CROSSTABLE (TimeFrame, Days)
LOAD recno() as ID,
[DoS to DoOS],
[DoOS to Do1NA],
[Do1NA to Do1O],
[Do1O to Do1PA],
[DoS to D1RAR],
[Do1PA to D1RAR],
[DoS to Do1CI]
Resident Lag;

TaxTable:
CROSSTABLE (TaxCat,Count)
Load recno() as ID,
[Taxonomy Category]
Resident Lag;

Try:
ADD
Load * Resident Lag;
Outer Join Load * Resident TaxTable;

When I load the data, here's what the script executioner says:

Lag << Lag 308,352 lines fetched
DaysTable << Lag 2,006,597 lines fetched
Lag << Lag 308,352 lines fetched
TaxTable << Lag 308,352 lines fetched
Lag << Lag 616,704 lines fetched *Wrong
Try << TaxTable 308,352 lines fetched *Wrong
$Syn 1 = Taxonomy+Taxonomy Category

The Lag table should continue to have 308,352 entries, and the try table should have the same number of lines as the DaysTable (2,006,597).

Ideally, what I probably should have done, was have the [Taxonomy code] Loaded with the DaysTable and avoid the merge altogether.

Any idea what's going wrong and how can I fix it?

Thanks!

1 Solution

Accepted Solutions
johnw
Champion III
Champion III

The ID being generated is the recno() of the OUTPUT table in each case, not the record number of the Lag table itself. So in the DaysTable, you have 2,006,597 IDs, while in the TaxTable you only have 308,352 IDs. Therefore, when you outer join them, you're joining to only the first 308,352 records in the DaysTable, and they're the wrong records - just whatever happens to come first. (Edit: I think I got that a little wrong. The outer join would have worked fine, as it would have joined by Taxonomy Category. But then the resulting data model would associate the wrong Taxonomy categories with the data in the DaysTable because of the way the IDs don't line up. Same bad result, but slightly different process to get there.)

The solution is to include the Taxonomy Category in the original load, as you said, and avoid the merge altogether. That just takes an additional parameter in the crosstable() to tell it that you have TWO fields (instead of the default of one) that come before the data it should switch from columns to rows. Like this:

DaysTable:
CROSSTABLE (TimeFrame, Days, 2)
LOAD recno() as ID,
[Taxonomy Category],
[DoS to DoOS],
[DoOS to Do1NA],
[Do1NA to Do1O],
[Do1O to Do1PA],
[DoS to D1RAR],
[Do1PA to D1RAR],
[DoS to Do1CI]
Resident Lag;

View solution in original post

2 Replies
johnw
Champion III
Champion III

The ID being generated is the recno() of the OUTPUT table in each case, not the record number of the Lag table itself. So in the DaysTable, you have 2,006,597 IDs, while in the TaxTable you only have 308,352 IDs. Therefore, when you outer join them, you're joining to only the first 308,352 records in the DaysTable, and they're the wrong records - just whatever happens to come first. (Edit: I think I got that a little wrong. The outer join would have worked fine, as it would have joined by Taxonomy Category. But then the resulting data model would associate the wrong Taxonomy categories with the data in the DaysTable because of the way the IDs don't line up. Same bad result, but slightly different process to get there.)

The solution is to include the Taxonomy Category in the original load, as you said, and avoid the merge altogether. That just takes an additional parameter in the crosstable() to tell it that you have TWO fields (instead of the default of one) that come before the data it should switch from columns to rows. Like this:

DaysTable:
CROSSTABLE (TimeFrame, Days, 2)
LOAD recno() as ID,
[Taxonomy Category],
[DoS to DoOS],
[DoOS to Do1NA],
[Do1NA to Do1O],
[Do1O to Do1PA],
[DoS to D1RAR],
[Do1PA to D1RAR],
[DoS to Do1CI]
Resident Lag;

Not applicable
Author

Thanks for you help John.