2 Replies Latest reply: Sep 15, 2010 3:16 PM by bryankoch RSS

    Outer Join Problems

    bryankoch

      Hello,

      This is a follow-up question from: http://community.qlik.com/forums/t/33999.aspx

      I realized after all that was said and done, I realized I needed another column in there. A here's a brief look at what it looked like originally:

       

      Taxonomy Category DoS to DoOS DoOS to Do1NA Do1NA to Do1O ....etc

      Cardiology 2 1 3 ....etc

      Ophthalmology 1 3 6 ...etc

      Orthopedics 6 8 9 ....etc

       

      What it should've been when it was transformed:

      Variables Days Taxonomy Category

      DoS to DoOS 2 Cardiology

      DoS to DoOS 1 Ophthalmology

      DoS to DoOS 6 Orthopedics

      DoOS to Do1NA 1 Cardiology

      DoOS to Do1NA 3 Ophthalmology

      DoOS to Do1NA 8 Orthopedics

      Do1NA to Do1O 3 Cardiology

      Do1NA to Do1O 6 Ophthalmology

      Do1NA to Do1O 9 Orthopedics

       

      Here's the code i've been tinkering with in an attempt to just add the column onto the first with it's ID key:

      DaysTable:
      CROSSTABLE (TimeFrame, Days)
      LOAD recno() as ID,
      [DoS to DoOS],
      [DoOS to Do1NA],
      [Do1NA to Do1O],
      [Do1O to Do1PA],
      [DoS to D1RAR],
      [Do1PA to D1RAR],
      [DoS to Do1CI]
      Resident Lag;

      TaxTable:
      CROSSTABLE (TaxCat,Count)
      Load recno() as ID,
      [Taxonomy Category]
      Resident Lag;

      Try:
      ADD
      Load * Resident Lag;
      Outer Join Load * Resident TaxTable;

       

      When I load the data, here's what the script executioner says:

      Lag << Lag 308,352 lines fetched
      DaysTable << Lag 2,006,597 lines fetched
      Lag << Lag 308,352 lines fetched
      TaxTable << Lag 308,352 lines fetched
      Lag << Lag 616,704 lines fetched *Wrong
      Try << TaxTable 308,352 lines fetched *Wrong
      $Syn 1 = Taxonomy+Taxonomy Category

       

      The Lag table should continue to have 308,352 entries, and the try table should have the same number of lines as the DaysTable (2,006,597).

      Ideally, what I probably should have done, was have the [Taxonomy code] Loaded with the DaysTable and avoid the merge altogether.

       

      Any idea what's going wrong and how can I fix it?

      Thanks!

        • Outer Join Problems
          John Witherspoon

          The ID being generated is the recno() of the OUTPUT table in each case, not the record number of the Lag table itself. So in the DaysTable, you have 2,006,597 IDs, while in the TaxTable you only have 308,352 IDs. Therefore, when you outer join them, you're joining to only the first 308,352 records in the DaysTable, and they're the wrong records - just whatever happens to come first. (Edit: I think I got that a little wrong. The outer join would have worked fine, as it would have joined by Taxonomy Category. But then the resulting data model would associate the wrong Taxonomy categories with the data in the DaysTable because of the way the IDs don't line up. Same bad result, but slightly different process to get there.)

          The solution is to include the Taxonomy Category in the original load, as you said, and avoid the merge altogether. That just takes an additional parameter in the crosstable() to tell it that you have TWO fields (instead of the default of one) that come before the data it should switch from columns to rows. Like this:

          DaysTable:
          CROSSTABLE (TimeFrame, Days, 2)
          LOAD recno() as ID,
          [Taxonomy Category],
          [DoS to DoOS],
          [DoOS to Do1NA],
          [Do1NA to Do1O],
          [Do1O to Do1PA],
          [DoS to D1RAR],
          [Do1PA to D1RAR],
          [DoS to Do1CI]
          Resident Lag;