4 Replies Latest reply: Feb 20, 2014 12:45 PM by Andrew Currens RSS

    Load longest string for ID

      I'm trying to load a unique id and string into a table. The catch is there are multiple strings per unique id and I only want the longest string returned.

       

      Data looks like this:

      IDString
      1

      abc

      1abcdef
      1abcdefghi

      2

      xyz

      2zxypdq
      3lmn
      3lmnopq

       

      What i'd like to end up with is a table containing a list of distinct ID's and the longest string for that particular ID.

      i.e.

      IDString
      1abcdefghi
      2xyzpdq
      3lmnopq

       

      I've tried maxstring/minstring but i'm not able to get this to sort on string length or those may work.

       

      I started looking at for loop options but that is much to slow for the dataset i'm working with.

       

      Any ideas?

        • Re: Load longest string for ID
          Aurélien Martinez

          Hi,

           

          Try this script:

           

          DATA:

          LOAD * INLINE [

          ID, String

          1, abc

          1, abcdef

          1, abcdefghi

          2,  xyz

          2, zxypdq

          3, lmn

          3, lmnopq

          ];

           

           

          NoConcatenate

          DATA2:

          LOAD

            ID,

            MaxString(String) as test

          Resident DATA

          Group By ID

          ;

           

          drop table DATA;

           

          Best,

          Aurélien

            • Re: Re: Load longest string for ID

              Thanks for this. The example works well. My actual data wont always be sorted and looks more like this:

               

              DATA:

              LOAD * INLINE

              [ ID, String

              1, "1234, 5678 |"

              3, "6789, 1234 |2345, 3456 |"

              1, "1234, 8976 |1234, 4567 |"

              2, "3456, 6789 |9012, 3456 |"

              1, "1234, 5678 |8901, 2345 |3456, 4567 |"

              2, "2345, 1234 |"

              3, "6789, 1234 |2345, 3456 |2346, 9876 |"

              3, "6789, 1234 |"

              ];

               

              NoConcatenate

              DATA2:

              LOAD   ID, 

                        MaxString(String) as test

                        Resident DATA

              Group By ID ;

               

              Using the above code it seems to incorrectly choose the 2nd longest string for ID 1:

              pic.PNG.png