Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear community! I'm trying to figure out how to search within my data... and I can't seem to automate this. This is the situation:
I have three variables I need to search for within text fields (Tweets):
### (three numbers)
L## (1 letter, 2 numbers)
LLLL## (4 letters, 2 numbers)
So whenever someone tweets, and it has any of these two variables, i'd like it to show me the original tweet.
Is there any filter that lets me quickly search for these variables in that format? (Bus number and Licence plate variables).
Thank you very much !!!!!!!
May be using RegExp: Regular expressions in the load script - The Qlik Fix! The Qlik Fix!
It isn't native to QlikView, but the macro might work for you
If you can flag it in script, here's a pure brute force approach. You'll probably need to be more sophisticated than this in practice, like I'm only taking spaces as word breaks, where you'd want to break on punctuation and the like. I have the feeling I'm missing some simpler way to do this (without using regular expressions).
SET vIsAlphanumeric = index('ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890',mid($1,$2,1));
SET vIsAlpha = index('ABCDEFGHIJKLMNOPQRSTUVWXYZ',mid($1,$2,1));
SET vIsNumeric = index('1234567890',mid($1,$2,1));
Words:
LOAD *
,if(len(Word)=3 and $(vIsAlphanumeric(Word,1))
and $(vIsNumeric(Word,2))
and $(vIsNumeric(Word,3)),'Y'
,if(len(Word)=6 and $(vIsAlpha(Word,1))
and $(vIsAlpha(Word,2))
and $(vIsAlpha(Word,3))
and $(vIsAlpha(Word,4))
and $(vIsNumeric(Word,5))
and $(vIsNumeric(Word,6)),'Y','N')) as Flagged?
;
LOAD
ID
,subfield(Tweet,' ') as Word
RESIDENT Raw
;
I like your patterns approach better than my brute force. I ended up with this, which also has the start of ignoring punctuation, at least in terms of figuring out the format of the word, if not the word itself in this case.
Format:
MAPPING LOAD
mid('ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890.,?',recno(),1) as From
,if(recno()<=26,'L',if(recno()<=36,'#',' ')) as To
AUTOGENERATE 39
;
Words:
LOAD *
,if(match(Format,'###','L##','LLLL##'),'Y','N') as Flagged?
;
LOAD *
,mapsubstring('Format',upper(Word)) as Format
;
LOAD
ID
,subfield(Tweet,' ') as Word
RESIDENT Raw
;
maybe like this:
table1:
LOAD RecNo() as RecID,
*
INLINE [
TextField
some random text including number 123
some text without numbers
more interesting text45 including numbers
another text without any number
yet another text including 1234 numbers
some text
some text 1
some text 12
some text 123
some text 1234
some text a
some text a1
some text a12
some text a123
some text a1234
some text ab
some text ab1
some text ab12
some text ab123
some text abc1234
some text abc
some text abc1
some text abc12
some text abc123
some text abc1234
some text abcd
some text abcd1
some text abcd12
some text abcd123
some text abcd1234
no numbers here as well
0123456789
abcdefghijklmnopqrstuvwxyz
];
table2:
LOAD *,
Text(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(
Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(
Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(
Replace(Replace(Replace(Replace(Replace(Replace(Upper(SubTextField),
'0','#'),'1','#'),'2','#'),'3','#'),'4','#'),'5','#'),'6','#'),'7','#'),'8','#'),'9','#')
,'B','L'),'C','L'),'D','L'),'E','L'),'F','L'),'G','L'),'H','L'),'I','L'),'J','L'),'K','L')
,'A','L'),'M','L'),'N','L'),'O','L'),'P','L'),'Q','L'),'R','L'),'S','L'),'T','L'),'U','L')
,'V','L'),'W','L'),'X','L'),'Y','L'),'Z','L')) as FieldFormat;
LOAD RecID,
SubField(TextField,' ') as SubTextField
Resident table1;
hope this helps
regards
Marco
Hi John,
I like your MapSubString approach for being far more elegant than my lengthy Replace() expression.
Maybe the two solutions could be combined.
regards
Marco
Hi,
borrowing John's MapSubString() idea, another solution could be:
SET vIsLet = (Upper($1)<>Lower($1));
SET vIsNumLet = (Upper($1)<>Lower($1) or IsNum($1));
mapChar2Patt:
Mapping LOAD char, If(IsNum(char),'#','L')
Where $(vIsNumLet(char));
LOAD Chr(RecNo()) as char
AutoGenerate 65535;
mapChar2Space:
Mapping LOAD char, ' '
Where not $(vIsNumLet(char));
LOAD Chr(RecNo()) as char
AutoGenerate 65535;
table1:
LOAD RecNo() as RecID,
*
INLINE [
TextField
some random text including number 123
some text without numbers
more interesting text45 including numbers
another text without any number
yet another text including 1234 numbers
some text
some text 1
some text 12
some text 123
some text 1234
some text a
some text a1
some text a12
some text a123
some text a1234
some text ab
some text ab1
some text ab12
some text ab123
some text abc1234
some text abc
some text abc1
some text abc12
some text abc123
some text abc1234
some text abcd
some text abcd1
some text abcd12
some text abcd123
some text abcd1234
some text. incl. punctuation
some text1. incl. punctuation
some text12. incl. punctuation
some text123. incl. punctuation
some text123.A12 incl. punctuation
"some"text123".A12,incl.punctuation"
some text123. A12 incl. punctuation
text inclüding söme spéciäl lettèrs
text inclüding söme à spéciäl lettèrs
text inclüding söme à1 spéciäl lettèrs
text inclüding söme à12 spéciäl lettèrs
text inclüding söme à123 spéciäl lettèrs
no numbers here as well
0123456789
abcdefghijklmnopqrstuvwxyz
];
table2:
LOAD *,
MapSubString('mapChar2Patt',SubTextField) as Pattern
Where Len(Trim(SubTextField));
LOAD RecID,
SubField(MapSubString('mapChar2Space',TextField),' ') as SubTextField
Resident table1;
hope this helps
regards
Marco