2 Replies Latest reply: Mar 9, 2017 2:26 PM by Nicky Maehler RSS

    QlikView script for Joining files and Deduplicating

    James Newcombe

      Hello, I have a query I'm hoping someone can help with. I have two tables I want to join in QlikView on a one-to-many basis, but after the join I want to pick a specific one of the many records to create a one-to-one file. Here are the details:

       

      Table A  - one record

      IDValue1

      1

      10
      212

       

      Table B - many records

      IDSeqValue2
      119
      1210
      1311
      218
      2211
      23

      13

       

      Then join them together on ID only where value1 is greater than or equal to value 2

      IDSeqValueValue2
      11109
      121010
      21128
      221211

       

      Then de-duplicate by picking the highest value2 record

      IDSeqValueValue2
      121010
      221211

       

      Hope that makes sense, does anyone know how to do it?

       

      Many thanks.

        • Re: QlikView script for Joining files and Deduplicating
          Dave Riley

          Something like this ...

           

          TableA:
          Load * inline [
          ID, Value1
          1, 10
          2, 12
          ]
          ;

          left join (TableA)

          //TableB:
          Load * inline [
          ID, Seq, Value2
          1, 1, 9
          1, 2, 10
          1, 3, 11
          2, 1, 8
          2, 2, 11
          2, 3, 13
          ]
          ;

          TableC:
          Noconcatenate Load ID, Value1 as Value, Seq, Value2 resident TableA where Value1 >= Value2;
          Drop Table TableA;

          inner join (TableC) Load ID, max(Value2) as Value2 Resident TableC Group By ID;

           

           

          flipside

          • Re: QlikView script for Joining files and Deduplicating
            Nicky Maehler

            Hi James,

             

            This should work (not sure if you need the sequence number too but if you do then the solution Dave Riley provided is great)

             

            TableA:
            Load * inline [
            ID, Value1
            1, 10
            2, 12
            ];

             

            LEFT JOIN (TableA)
            Load * inline [
            ID, Seq, Value2
            1, 1, 9
            1, 2, 10
            1, 3, 11
            2, 1, 8
            2, 2, 11
            2, 3, 13
            ];

             

            RESULT:
            LOAD ID,
                  Value1 as Value,
                  max(Value2) as Value2
            RESIDENT TableA
            where Value1 >= Value2
            group by ID, Value1;

             

            DROP TABLE TableA;

             

             

            If for the 3rd table you wanted it where the max value is equal to or higher than Value1 (opposite way around - I know because I mis-read it the first time) then mapping will work here. Just adding my example below for this too because I find mapping so useful. Just remember because you want the max number then the order by clause (in descending order) is vital. Mapping is great because it saves re-reading tables when joining them and it only ever maps the first record in (hence the reason of the order by clause).

             

            //this answer will get the max(Value2) where Value2 >= Value1
            TableA:
            Load * inline [
            ID, Value1
            1, 10
            2, 12
            ];

             

            TMP:
            Load * inline [
            ID, Seq, Value2
            1, 1, 9
            1, 2, 10
            1, 3, 11
            2, 1, 8
            2, 2, 11
            2, 3, 13
            ];

             

            MAP_HIGHEST:
            MAPPING Load ID,
            Value2
            RESIDENT TMP
            order by ID, Value2 desc;

             

            DROP TABLE TMP;

             

            RESULT:
            LOAD ID,
            Value1,
            ApplyMap('MAP_HIGHEST', ID, Value1) as Value2
            RESIDENT TableA;

             

            DROP TABLE TableA;