Qlik Community

QlikView Scripting

Discussion Board for collaboration on QlikView Scripting.

Announcements

Breathe easy -- you now have more time to plan your next steps with Qlik!
QlikView 11.2 Extended Support is now valid through December 31, 2020. Click here for more information.

Not applicable

Finding text patterns in Data Columns

Dear community! I'm trying to figure out how to search within my data... and I can't seem to automate this. This is the situation:

I have three variables I need to search for within text fields (Tweets):

### (three numbers)

L## (1 letter, 2 numbers)

LLLL## (4 letters, 2 numbers)

So whenever someone tweets, and it has any of these two variables, i'd like it to show me the original tweet.

Is there any filter that lets me quickly search for these variables in that format? (Bus number and Licence plate variables).

Thank you very much !!!!!!!

7 Replies

Re: Finding text patterns in Data Columns

May be using RegExp: Regular expressions in the load script - The Qlik Fix! The Qlik Fix!

It isn't native to QlikView, but the macro might work for you

Re: Finding text patterns in Data Columns

MVP
MVP

Re: Finding text patterns in Data Columns

If you can flag it in script, here's a pure brute force approach. You'll probably need to be more sophisticated than this in practice, like I'm only taking spaces as word breaks, where you'd want to break on punctuation and the like. I have the feeling I'm missing some simpler way to do this (without using regular expressions).

SET vIsAlphanumeric = index('ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890',mid($1,$2,1));
SET vIsAlpha = index('ABCDEFGHIJKLMNOPQRSTUVWXYZ',mid($1,$2,1));
SET vIsNumeric = index('1234567890',mid($1,$2,1));

Words:
LOAD *
,
if(len(Word)=3 and $(vIsAlphanumeric(Word,1))
and $(vIsNumeric(Word,2))
and $(vIsNumeric(Word,3)),'Y'
,
if(len(Word)=6 and $(vIsAlpha(Word,1))
and $(vIsAlpha(Word,2))
and $(vIsAlpha(Word,3))
and $(vIsAlpha(Word,4))
and $(vIsNumeric(Word,5))
and $(vIsNumeric(Word,6)),'Y','N')) as Flagged?
;
LOAD
ID
,
subfield(Tweet,' ') as Word
RESIDENT Raw
;

MVP
MVP

Re: Finding text patterns in Data Columns

I like your patterns approach better than my brute force. I ended up with this, which also has the start of ignoring punctuation, at least in terms of figuring out the format of the word, if not the word itself in this case.

Format:
MAPPING LOAD
mid('ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890.,?',recno(),1) as From
,
if(recno()<=26,'L',if(recno()<=36,'#',' ')) as To
AUTOGENERATE 39
;
Words:
LOAD *
,
if(match(Format,'###','L##','LLLL##'),'Y','N') as Flagged?
;
LOAD *
,
mapsubstring('Format',upper(Word)) as Format
;
LOAD
ID
,
subfield(Tweet,' ') as Word
RESIDENT Raw
;

Re: Finding text patterns in Data Columns

maybe like this:

QlikCommunity_Thread_217611_Pic1.JPG

QlikCommunity_Thread_217611_Pic2.JPG

table1:

LOAD RecNo() as RecID,

    *

INLINE [

    TextField

    some random text including number 123

    some text without numbers

    more interesting text45 including numbers

    another text without any number

    yet another text including 1234 numbers

    some text

    some text 1

    some text 12

    some text 123

    some text 1234

    some text a

    some text a1

    some text a12

    some text a123

    some text a1234

    some text ab

    some text ab1

    some text ab12

    some text ab123

    some text abc1234

    some text abc

    some text abc1

    some text abc12

    some text abc123

    some text abc1234

    some text abcd

    some text abcd1

    some text abcd12

    some text abcd123

    some text abcd1234

    no numbers here as well

    0123456789

    abcdefghijklmnopqrstuvwxyz

];

table2:

LOAD *,

    Text(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(

    Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(

    Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(

    Replace(Replace(Replace(Replace(Replace(Replace(Upper(SubTextField),

    '0','#'),'1','#'),'2','#'),'3','#'),'4','#'),'5','#'),'6','#'),'7','#'),'8','#'),'9','#')

    ,'B','L'),'C','L'),'D','L'),'E','L'),'F','L'),'G','L'),'H','L'),'I','L'),'J','L'),'K','L')

    ,'A','L'),'M','L'),'N','L'),'O','L'),'P','L'),'Q','L'),'R','L'),'S','L'),'T','L'),'U','L')

    ,'V','L'),'W','L'),'X','L'),'Y','L'),'Z','L')) as FieldFormat;

LOAD RecID,

    SubField(TextField,' ') as SubTextField

Resident table1;

hope this helps

regards

Marco

Re: Finding text patterns in Data Columns

Hi John,

I like your MapSubString approach for being far more elegant than my lengthy Replace() expression.

Maybe the two solutions could be combined.

regards

Marco

Re: Finding text patterns in Data Columns

Hi,

borrowing John's MapSubString() idea, another solution could be:

QlikCommunity_Thread_217611_Pic3.JPG

QlikCommunity_Thread_217611_Pic4.JPG

QlikCommunity_Thread_217611_Pic5.JPG

SET vIsLet    = (Upper($1)<>Lower($1));

SET vIsNumLet = (Upper($1)<>Lower($1) or IsNum($1));

mapChar2Patt:

Mapping LOAD char, If(IsNum(char),'#','L')

Where $(vIsNumLet(char));

LOAD Chr(RecNo()) as char

AutoGenerate 65535;

mapChar2Space:

Mapping LOAD char, ' '

Where not $(vIsNumLet(char));

LOAD Chr(RecNo()) as char

AutoGenerate 65535;

table1:

LOAD RecNo() as RecID,

    *

INLINE [

    TextField

    some random text including number 123

    some text without numbers

    more interesting text45 including numbers

    another text without any number

    yet another text including 1234 numbers

    some text

    some text 1

    some text 12

    some text 123

    some text 1234

    some text a

    some text a1

    some text a12

    some text a123

    some text a1234

    some text ab

    some text ab1

    some text ab12

    some text ab123

    some text abc1234

    some text abc

    some text abc1

    some text abc12

    some text abc123

    some text abc1234

    some text abcd

    some text abcd1

    some text abcd12

    some text abcd123

    some text abcd1234

    some text. incl. punctuation

    some text1. incl. punctuation

    some text12. incl. punctuation

    some text123. incl. punctuation

    some text123.A12 incl. punctuation

    "some"text123".A12,incl.punctuation"

    some text123. A12 incl. punctuation

    text inclüding söme spéciäl lettèrs

    text inclüding söme à spéciäl lettèrs

    text inclüding söme à1 spéciäl lettèrs

    text inclüding söme à12 spéciäl lettèrs

    text inclüding söme à123 spéciäl lettèrs

    no numbers here as well

    0123456789

    abcdefghijklmnopqrstuvwxyz

];

table2:

LOAD *,

    MapSubString('mapChar2Patt',SubTextField) as Pattern

Where Len(Trim(SubTextField));

LOAD RecID,

    SubField(MapSubString('mapChar2Space',TextField),' ') as SubTextField

Resident table1;

hope this helps

regards

Marco

Community Browser