7 Replies Latest reply: May 20, 2016 5:40 PM by Marco Wedel RSS

    Finding text patterns in Data Columns

    Roberto guerra

      Dear community! I'm trying to figure out how to search within my data... and I can't seem to automate this. This is the situation:

       

      I have three variables I need to search for within text fields (Tweets):

      ### (three numbers)

      L## (1 letter, 2 numbers)

      LLLL## (4 letters, 2 numbers)

       

      So whenever someone tweets, and it has any of these two variables, i'd like it to show me the original tweet.

       

      Is there any filter that lets me quickly search for these variables in that format? (Bus number and Licence plate variables).

       

      Thank you very much !!!!!!!

        • Re: Finding text patterns in Data Columns
          Sunny Talwar

          May be using RegExp: Regular expressions in the load script - The Qlik Fix! The Qlik Fix!

           

          It isn't native to QlikView, but the macro might work for you

            • Re: Finding text patterns in Data Columns
              John Witherspoon

              I like your patterns approach better than my brute force. I ended up with this, which also has the start of ignoring punctuation, at least in terms of figuring out the format of the word, if not the word itself in this case.

               

              Format:
              MAPPING LOAD
              mid('ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890.,?',recno(),1) as From
              ,
              if(recno()<=26,'L',if(recno()<=36,'#',' ')) as To
              AUTOGENERATE 39
              ;
              Words:
              LOAD *
              ,
              if(match(Format,'###','L##','LLLL##'),'Y','N') as Flagged?
              ;
              LOAD *
              ,
              mapsubstring('Format',upper(Word)) as Format
              ;
              LOAD
              ID
              ,
              subfield(Tweet,' ') as Word
              RESIDENT Raw
              ;

              • Re: Finding text patterns in Data Columns
                Marco Wedel

                maybe like this:

                QlikCommunity_Thread_217611_Pic1.JPG

                QlikCommunity_Thread_217611_Pic2.JPG

                 

                table1: 
                LOAD RecNo() as RecID, 
                    *
                INLINE [
                    TextField
                    some random text including number 123
                    some text without numbers
                    more interesting text45 including numbers
                    another text without any number
                    yet another text including 1234 numbers
                    some text
                    some text 1
                    some text 12
                    some text 123
                    some text 1234
                    some text a
                    some text a1
                    some text a12
                    some text a123
                    some text a1234
                    some text ab
                    some text ab1
                    some text ab12
                    some text ab123
                    some text abc1234
                    some text abc
                    some text abc1
                    some text abc12
                    some text abc123
                    some text abc1234
                    some text abcd
                    some text abcd1
                    some text abcd12
                    some text abcd123
                    some text abcd1234
                    no numbers here as well
                    0123456789
                    abcdefghijklmnopqrstuvwxyz
                ];
                
                table2:
                LOAD *,
                    Text(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace( 
                    Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace( 
                    Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace( 
                    Replace(Replace(Replace(Replace(Replace(Replace(Upper(SubTextField), 
                    '0','#'),'1','#'),'2','#'),'3','#'),'4','#'),'5','#'),'6','#'),'7','#'),'8','#'),'9','#') 
                    ,'B','L'),'C','L'),'D','L'),'E','L'),'F','L'),'G','L'),'H','L'),'I','L'),'J','L'),'K','L') 
                    ,'A','L'),'M','L'),'N','L'),'O','L'),'P','L'),'Q','L'),'R','L'),'S','L'),'T','L'),'U','L') 
                    ,'V','L'),'W','L'),'X','L'),'Y','L'),'Z','L')) as FieldFormat;
                LOAD RecID,
                    SubField(TextField,' ') as SubTextField
                Resident table1;
                

                 

                 

                hope this helps

                 

                regards

                 

                Marco

              • Re: Finding text patterns in Data Columns
                John Witherspoon

                If you can flag it in script, here's a pure brute force approach. You'll probably need to be more sophisticated than this in practice, like I'm only taking spaces as word breaks, where you'd want to break on punctuation and the like. I have the feeling I'm missing some simpler way to do this (without using regular expressions).

                 

                 

                SET vIsAlphanumeric = index('ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890',mid($1,$2,1));
                SET vIsAlpha = index('ABCDEFGHIJKLMNOPQRSTUVWXYZ',mid($1,$2,1));
                SET vIsNumeric = index('1234567890',mid($1,$2,1));

                Words:
                LOAD *
                ,
                if(len(Word)=3 and $(vIsAlphanumeric(Word,1))
                and $(vIsNumeric(Word,2))
                and $(vIsNumeric(Word,3)),'Y'
                ,
                if(len(Word)=6 and $(vIsAlpha(Word,1))
                and $(vIsAlpha(Word,2))
                and $(vIsAlpha(Word,3))
                and $(vIsAlpha(Word,4))
                and $(vIsNumeric(Word,5))
                and $(vIsNumeric(Word,6)),'Y','N')) as Flagged?
                ;
                LOAD
                ID
                ,
                subfield(Tweet,' ') as Word
                RESIDENT Raw
                ;

                • Re: Finding text patterns in Data Columns
                  Marco Wedel

                  Hi,

                   

                  borrowing John's MapSubString() idea, another solution could be:

                   

                  QlikCommunity_Thread_217611_Pic3.JPG

                  QlikCommunity_Thread_217611_Pic4.JPG

                   

                  QlikCommunity_Thread_217611_Pic5.JPG

                   

                   

                  SET vIsLet    = (Upper($1)<>Lower($1));
                  SET vIsNumLet = (Upper($1)<>Lower($1) or IsNum($1));
                  
                  mapChar2Patt:
                  Mapping LOAD char, If(IsNum(char),'#','L')
                  Where $(vIsNumLet(char));
                  LOAD Chr(RecNo()) as char
                  AutoGenerate 65535;
                  
                  mapChar2Space:
                  Mapping LOAD char, ' '
                  Where not $(vIsNumLet(char));
                  LOAD Chr(RecNo()) as char
                  AutoGenerate 65535;
                  
                  table1:
                  LOAD RecNo() as RecID,
                      *
                  INLINE [
                      TextField
                      some random text including number 123
                      some text without numbers
                      more interesting text45 including numbers
                      another text without any number
                      yet another text including 1234 numbers
                      some text
                      some text 1
                      some text 12
                      some text 123
                      some text 1234
                      some text a
                      some text a1
                      some text a12
                      some text a123
                      some text a1234
                      some text ab
                      some text ab1
                      some text ab12
                      some text ab123
                      some text abc1234
                      some text abc
                      some text abc1
                      some text abc12
                      some text abc123
                      some text abc1234
                      some text abcd
                      some text abcd1
                      some text abcd12
                      some text abcd123
                      some text abcd1234
                      some text. incl. punctuation
                      some text1. incl. punctuation
                      some text12. incl. punctuation
                      some text123. incl. punctuation
                      some text123.A12 incl. punctuation
                      "some"text123".A12,incl.punctuation"
                      some text123. A12 incl. punctuation
                      text inclüding söme spéciäl lettèrs
                      text inclüding söme à spéciäl lettèrs
                      text inclüding söme à1 spéciäl lettèrs
                      text inclüding söme à12 spéciäl lettèrs
                      text inclüding söme à123 spéciäl lettèrs
                      no numbers here as well
                      0123456789
                      abcdefghijklmnopqrstuvwxyz
                  ];
                  
                  table2:
                  LOAD *,
                      MapSubString('mapChar2Patt',SubTextField) as Pattern
                  Where Len(Trim(SubTextField));
                  LOAD RecID,
                      SubField(MapSubString('mapChar2Space',TextField),' ') as SubTextField
                  Resident table1;
                  
                  
                  
                  

                   

                  hope this helps

                   

                  regards

                   

                  Marco