4 Replies Latest reply: Oct 15, 2013 6:38 AM by Kenneth Madsen RSS

    Join too big?

      I'm getting some results from a join that I don't want...

       

      I have a large (180K row) dataset that I need to load to a SQL table. Before I load it, I need to make sure that it hasn't already been loaded. We'll call the source file "input" and my Sql table "output". My approach of using a join to do this may be the wrong one.

       

      To determine if the data has already been loaded, I need to compare four fields that are in both.

      If - input.field1 = output.field1 AND input.field2= output.field2 AND input.field3=output.field3 AND input.field4=output.field4 then the record has already been loaded, so I need to ignore it.

       

      I built a join and added the four fields as keys. Where there is NO join result, I get the expected results on the output1 port. On the output0 port, however, instead of getting a maximum of 180K rows, I'm getting a HUGE number. I've let it run as high 35 million. Since I am trashing these, I really don't need them.

       

      So first,is there a different flavor of join, create a key out pf multiple fields, the will only emit one matched record? Or do I need to take a different approach?

        • Re: Join too big?
          Henric Cronström

          I would do this in QlikView instead using a Where not Exists() instead.

           

          I you create a concatenated key, and load first table1, you can in a second step load table2 using a Where not exists to get only the new records

           

          A:

          Load *,

               field1 & '|' & field2 & '|' & field3 & '|' & field4 as TestKey,

               ...

           

          B:

          Load * Where not Exists(TestKey):

          Load *,

               field1 & '|' & field2 & '|' & field3 & '|' & field4 as TestKey,

               ...

           

          HIC

          • Re: Join too big?
            Hugo Sheng

            You're probably doing an inner join which causes a many to many match.  You really want to use join type 0 or a left outer join to produce the correct results.

             

            A better approach may be to use the reference table as a lookup table and then only load the records that failed the lookup.  Here's a couple of useful tutorials on the lookup operator:

             

            http://community.qlik.com/docs/DOC-3226

            http://community.qlik.com/docs/DOC-3260

              • Re: Join too big?

                Actually, I was using a join type 0 in this case.

                 

                I'm trying it using lookups (thanks for the references, but I'm already VERY familiar with the lookup operator). I was trying to avoid the lookup in this case because of the number of records in the table.

                  • Re: Join too big?
                    Kenneth Madsen

                    load new data only.PNG.png

                     

                    We do something all the time to prevent loading data that has been previously loaded, but it uses the inner join and relies on the lower left port to load the data missing from the target table.  The read table and write table are pointing to the same database table.  All matching records for the join are discarded and anything from input 0 that is not in input 1 is emitted to the lower data stream.