Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have records, when every record has a sentence.
I would like to perform tests on the number of occurrences of each word in the sentence.
The test on each word individually I was able to perform using the forum help! Thank you
The next step I would like to do is display a pair of words, for example:
sentence 1 : "I want to learn English"
sentence 2 : "I want to learn Spanish"
sentence 3 : "I want to learn French"
The result I would like to receive is:
I want - 3
want to - 3
to learn - 3
learn English - 1
learn Spanish- 1
learn French - 1
Does anyone have an idea?
thanks
Just to get an impression of what this solution could be used for:
hope this helps
regards
Marco
I have seen some great thread by marcowedel on this same topic... he might be able to offer his expertise here....
Try with this
Table:
LOAD SubField(F1,' ') as Word, RecNo() as Line, RowNo() as Position INLINE [
F1
I want to learn English
I want to learn Spanish
I want to learn French
];
Left join(Table)
LOAD
Word as Word1,
Position as Position1,
Line
Resident Table;
Final:
Load
Line,
Word & ' ' & Word1 as Pair,
RowNo() as Sequence
Resident Table
Where Position +1 = Position1;
Drop Table Table;
load
Pair,
Count(Pair) as Qty
Resident Final
Group by Pair;
Drop Table Final;
Hi,
try this:
test:
LOAD *
Inline [
sentence
sentence 1 : "I want to learn English"
sentence 2 : "I want to learn Spanish"
sentence 3 : "I want to learn French"
];
for each var in 'learn English','learn Spanish','learn French','I want','want to','to learn'
test2:
LOAD sentence,
if(wildmatch(sentence,'*$(var)*'),'$(var)') as lib,
if(wildmatch(sentence,'*$(var)*'),1) as num
resident test;
next var
attached app
Hi,
one solution might be
mapNonLetterToSpace:
Mapping
LOAD Chr(RecNo()), ' '
AutoGenerate 65535
Where Upper(Chr(RecNo()))=Lower(Chr(RecNo()));
mapReduceMultispace:
Mapping
LOAD Repeat(' ',100-RecNo()), ' '
AutoGenerate 98;
tabTextLines:
LOAD RowNo() as LineID,
TextLine,
Trim(MapSubString('mapReduceMultispace',MapSubString('mapNonLetterToSpace',TextLine))) as TextLineWordSep;
LOAD * INLINE [
TextLine
I want to learn English
I want to learn Spanish
I want to learn French
];
tabWordTuples:
LOAD *,
Upper(WordTuple) as WORDTuple,
AutoNumber(WordTuple,'WordTupleID') as WordTupleID,
AutoNumber(Upper(WordTuple),'WordTupleID') as WORDTupleID,
AutoNumber(Hash128(WordTuple,LineID,WordTupleStart),'WordTuplePosID') as WordTuplePosID;
LOAD LineID,
WordTupleStart,
IterNo() as WordTupleLength,
Left(SubStrRight,Index(SubStrRight&' ',' ',IterNo())-1) as WordTuple
While IterNo() <= SubStringCount(SubStrRight,' ')+1;
LOAD LineID,
IterNo() as WordTupleStart,
Mid(TextLineWordSep,Index(' '&TextLineWordSep,' ',IterNo())) as SubStrRight
Resident tabTextLines
While IterNo() <= SubStringCount(TextLineWordSep,' ')+1;
tabWords:
LOAD LineID,
WordTupleStart as WordNo,
AutoNumber(Hash128(LineID,WordTupleStart),'WordID') as WordID,
WordTuple as Word,
WORDTuple as WORD
Resident tabWordTuples
Where WordTupleLength=1
Order By LineID,WordTupleStart;
tabWordLink:
LOAD WordTuplePosID,
AutoNumber(Hash128(LineID,WordTupleStart+IterNo()-1),'WordID') as WordID
Resident tabWordTuples
While IterNo() <= WordTupleLength;
DROP Field LineID From tabWordTuples;
(adapting a more general approach I previously created)
hope this helps
regards
Marco
Just to get an impression of what this solution could be used for:
hope this helps
regards
Marco
May be this
LOAD Text1,Count(Text1) as Counter group By Text1;
LOAD *,SubField(Text,' ',IterNo())&' '&SubField(Text,' ',IterNo()+1) as Text1
Inline [
Text
"I want to learn English"
"I want to learn Spanish"
"I want to learn French"
]While Iterno() <= SubStringCount(Text,' ');
Regards,
Antonio
please close your thread if your question is anwered:
Qlik Community Tip: Marking Replies as Correct or Helpful
thanks
regards
Marco
Hi there - I was in a process of getting a license. Would very appreciate if you could post the full code here instead of the qvw file.
I'm still in the process of understanding your codes in this post: https://community.qlik.com/t5/QlikView-App-Development/How-to-identify-recurrent-terms-in-text-field.... It's quite intriguing that the longest word tuples in the results set is only 3 words.
Thanks