Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all
If I have an excel file of keywords, e.g.
Keywords:
Load Keyword
from test.xls;
and a results file which contains numerous fields including 2 multiline text fields e.g.
Results:
LOAD ID,
Country,
TextField1,
TextField2
from results.xls;
and I want to find out whether each keyword appears in either TextField1 or TextField2 in my results file, what's the best way to go about this?
i.e. if my keywords table looks like this
'apple'
'orange'
'banana'
and the TextField1 of the first row of my results file is '2 apples and a banana'
and the TextField1 of the second row is 'bike' and Text Field2 is 'an orange'
I would want to flag the first row as containing the keywords 'apple' and 'banana' and the 2nd row as containing the keyword 'orange'
I would like to do this in the load script as my results file contains so many rows and I need to provide stats on how many rows contain each of the keyword.
Any advice?
Many thanks in advance
May be like this:
Keywords:
Mapping
LOAD Upper(Keyword) as Keyword,
'/' & Lower(Keyword) & '\' as Flag;
LOAD * Inline [
Keyword
apple
orange
banana
];
Results:
LOAD *,
PurgeChar(Replace(KeepChar(MapSubString('Keywords', UPPER(TextField1)), 'abcdefghijklmnopqrstuvwxyz/\'), '\/', ', '), '/\') as Flag1,
PurgeChar(Replace(KeepChar(MapSubString('Keywords', UPPER(TextField2)), 'abcdefghijklmnopqrstuvwxyz/\'), '\/', ', '), '/\') as Flag2;
LOAD * Inline [
ID, TextField1, TextField2
1, 2 apples and a banana, xyz
2, bike, an orange
];
Hi Karen,
Keywords:
LOAD Keyword,Keyword&' '&Peek(Keyword1) as Keyword1 Inline [
Keyword
apple
orange
banana
];
Let vKeyword = Peek('Keyword1');
Table:
Left Keep
LOAD *,KeepChar(TempKeyword,'$(vKeyword)') as Keyword;
LOAD ID,TextField1,TextField2,
SubField(TextField1&' '&TextField2,' ') as TempKeyword
Inline [
ID,TextField1,TextField2
1,2 apples and a banana,cigar
2,bike,an orange
3,car and 3 apples and 1 banana and 2 oranges,
];
Join
LOAD ID,Concat(Keyword,',') as Flag
Resident Table
Group by ID;
Drop Field Keyword1,TempKeyword;
Regards,
Antonio
Thanks so much Sunny, Antonio, for the quick replies!
I've tested both ways:
Sunny, yours works but my real keywords could have spaces, your code is removing the space (i.e. if my keyword is Green Car its showing in the Flag1 field as greencar. Also, if the keyword is found more than once, it is replicating. Can I make Flag1 only give me unique values?
Antonio,
Unfortunately the data which could appear in textfields 1 & 2 could be very long, (this data is coming from a form and those fields are large free text fields) and I believe the code above is splitting these fields into its individual words, is that right? I tried running this against my data and its running upwards of 1 mil rows!
Thank both again for your super quick responses,
Karen
Here is a slightly modified code which handles the space and also handles the issue of multiple keywords (but at cost of some performance) in Flag1 field only. Not sure if we need to handle the multiple keywords issue for Flag2 field also
Keywords:
Mapping
LOAD Upper(Keyword) as Keyword,
'/' & Replace(Lower(Keyword), ' ', '@') & '\' as Flag;
LOAD * Inline [
Keyword
apple
Green Car
orange
banana
];
Results:
LOAD ID,
TextField1,
TextField2,
Flag2,
Concat(DISTINCT Flag1, ', ') as Flag1
Group By ID, TextField1, TextField2, Flag2;
LOAD ID,
TextField1,
TextField2,
SubField(Flag1, ', ') as Flag1,
Flag2;
LOAD *,
Replace(PurgeChar(Replace(KeepChar(MapSubString('Keywords', UPPER(TextField1)), 'abcdefghijklmnopqrstuvwxyz/\@'), '\/', ', '), '/\'), '@', ' ') as Flag1,
Replace(PurgeChar(Replace(KeepChar(MapSubString('Keywords', UPPER(TextField2)), 'abcdefghijklmnopqrstuvwxyz/\@'), '\/', ', '), '/\'), '@', ' ') as Flag2;
LOAD * Inline [
ID, TextField1, TextField2
1, 2 Green Car and a banana, xyz
2, bike, an orange
3, Green Car Green Car Green Car
];
Hi,
maybe another solution could be:
Keywords:
LOAD RecNo() as %KeywordID, *
INLINE [
Keyword
apple
orange
banana
peach
grape
];
mapKeywords:
Mapping
LOAD Keyword, '@start@'&%KeywordID&'@end@'
Resident Keywords;
Results:
LOAD RecNo() as ID, * INLINE [
TextField1, TextField2
2 apples and a banana, some peaches
bike, an orange
apple juice, something else
nothing, anything
something different, prunes
many grapes, other fruits
];
tabLink:
CrossTable (TextField, TextFieldValue)
LOAD * Resident Results;
Left Join (tabLink)
LOAD Distinct
TextFieldValue,
TextBetween(MapSubString('mapKeywords', TextFieldValue),'@start@','@end@',IterNo()) as %KeywordID
Resident tabLink
While IterNo()<=SubStringCount(MapSubString('mapKeywords', TextFieldValue),'@start@');
hope this helps
regards
Marco