4 Replies Latest reply: Jun 4, 2017 9:28 AM by Cheenu Janakiram RSS

    Categorizing a Field

    Michael Ellerbeck

      I was helping a guy here at work, and he had some data that was similar that he wanted to be categorized together.

       

      i.e.  Big Dog, The Dog, would all have the category of DOG

       

      I did it using an applymap and wildmatch. How would you do it?

       

      I ran across this post http://www.qlikfix.com/2010/10/18/regular-expressions-in-the-load-script/ which is interesting. I do think it would be cool to have native regex in QV!

       

      Category_Map:

      MAPPING

      LOAD * INLINE [

          x,y

          1,RED

          2,PINK

          3,BLUE

      ];

       

       

      Initial_Data:

      LOAD * INLINE [

          ID, Category

          123, Red Paint

          223, The Red Paint

          323, Pink

          423, Pink Paint

          523, Blue Paint

      ];

       

       

      QUALIFY *;

      Transformed_Data:

      Load

                ID,

                // Use the applymap to classify the profession, 1000 is the default-value

                // if the applymap does not find a match

                applymap('Category_Map',Wildmatch(Category,'*red*','*pink*','*blue*'),'None') as Category

      RESIDENT Initial_Data;

       

       

      Drop table Initial_Data;

        • Re: Categorizing a Field
          Cheenu Janakiram

          Hi Michael,

           

          Your approach seems to be one I would use. Below is a variant you might also want to consider, where the "wildmatch string" is actually generated on the fly rather than having to type it out in the applymap/wildmatch.

           

          Check below. I have used this "technique" to allow users to make data enrichment and custom categorisations, as well string searching for information.

           

          Hope this adds on a new layer you can use elsewhere - although my answer to your post is a few years "late" (i.e. I was looking to see if this 'technique' was worth blogging and came across your string).

           

          cheerio,

           

          C  ;o)

           

          //Script start here

          WildmatchStrings_Temp:

          //2. Create a numerical equivalent for each line/string of data in table:

          LOAD *, Rowno() as OrdinalPosition;

          //1. Original strings against which you are trying to match:

          LOAD * INLINE [

               Clean_Name_as_Category, WildMatchString

               Accenture, accenture

               Hitachi, hitachi

               US Department of Army, u*s*d*p*t*army

          ];

           

          //Create a field with all strings which you will put in a variable and is sequentially 'ordered' as it is in above table:

          Var_WildMatchString:

          LOAD CHR(39) & '*' & Concat(DISTINCT WildMatchString, '*' & CHR(39) & ',' & CHR(39) & '*', OrdinalPosition) & '*' & CHR(39) as Var_WildMatchString RESIDENT WildmatchStrings_Temp;

           

          //Insert value in var:

          LET vVar_WildMatchString = PEEK('Var_WildMatchString', 0, 'Var_WildMatchString');

          DROP TABLE Var_WildMatchString;

           

          //Reload table as Mapping table:

          WildMatch_Map:

          MAPPING LOAD

               OrdinalPosition,

               Clean_Name_as_Category

          RESIDENT WildmatchStrings_Temp;

           

          DROP TABLE WildmatchStrings_Temp;

           

          THE_FACT_TABLE:

          LOAD

               *,

               ApplyMap('WildMatch_Map', WildMatchOrdinalPosition) as Clean_Name_as_Category

          ;

          LOAD

               *,

               Wildmatch(AccountNames, $(vVar_WildMatchString)) as WildMatchOrdinalPosition

          ;

          LOAD * INLINE [

              AccountNames

              Accenture Pvt Ltd

              Accenture Private Limited

              Accenture Consulting

              Hitachi Systems Ltd

              Hitachi System Limited

              Hitachi France

              Us Department of Army

              United States Department of Army

              us dept army

          ];

          //end script here

          • Re: Categorizing a Field
            Vineeth Pujari

            This can be simplified using MapSubstring, but the only thing with this is that MapSubstring is case sensitive

             

            as below

             

            Category_Map:

            MAPPING

            LOAD * INLINE [

                y,x

                RED,<RED>

                PINK,<PINK>

                BLUE,<BLUE>

            ];

             

             

            Initial_Data:

            LOAD * ,textbetween(MapSubString('Category_Map', UPPER(Category)),'<','>') as Description

              INLINE [

                ID, Category

                123, RED Paint

                223, The Red Paint

                323, PINK

                423, Pink Paint

                523, Blue Paint

            ];

            • Re: Categorizing a Field
              Vineeth Pujari

              btw if you have less categories you don't need a Mapping table just Pick(Wildmatch( will do

               

              Initial_Data:

              LOAD * , Pick(Wildmatch(Category,'*red*','*pink*','*blue*')+1,'None','Red','Pink','Blue') as NEWCategory INLINE [

                  ID, Category

                  123, Red Paint

                  223, The Red Paint

                  323, Pink

                  423, Pink Paint

                  523, Blue Paint

              ];