Qlik Community

Ask a Question

QlikView App Dev

Discussion Board for collaboration related to QlikView App Development.

Announcements
Join us at the Cloud Data and Analytics Tour! REGISTER TODAY
cancel
Showing results for 
Search instead for 
Did you mean: 
amiroh81
Creator
Creator

Pair of words from sentence

Hi,

I have records, when every record has a sentence.

I would like to perform tests on the number of occurrences of each word in the sentence.

The test on each word individually I was able to perform using the forum help! Thank you

The next step I would like to do is display a pair of words, for example:

sentence 1 : "I want to learn English"

sentence 2 : "I want to learn Spanish"

sentence 3 : "I want to learn French"


The result I would like to receive is:

I want - 3

want to - 3

to learn - 3

learn English - 1

learn Spanish- 1

learn French - 1


Does anyone have an idea?

thanks

1 Solution

Accepted Solutions
MarcoWedel

Just to get an impression of what this solution could be used for:

QlikCommunity_Thread_283581_Pic2.JPG

QlikCommunity_Thread_283581_Pic3.JPG

QlikCommunity_Thread_283581_Pic4.JPG

QlikCommunity_Thread_283581_Pic6.JPG

QlikCommunity_Thread_283581_Pic7.JPG


QlikCommunity_Thread_283581_Pic8.JPG


hope this helps


regards


Marco



View solution in original post

8 Replies
sunny_talwar

I have seen some great thread by marcowedel‌ on this same topic... he might be able to offer his expertise here....

unstructured text analysis

Clever_Anjos
Employee
Employee

Try with this

Table:

LOAD SubField(F1,' ') as Word, RecNo() as Line, RowNo() as Position INLINE [

    F1

    I want to learn English

    I want to learn Spanish

    I want to learn French

];

Left join(Table)

LOAD

Word as Word1,

Position as Position1,

Line

Resident Table;

Final:

Load

Line,

Word & ' ' & Word1 as Pair,

RowNo() as Sequence

Resident Table

Where Position +1 = Position1;

Drop Table Table;

load

Pair,

Count(Pair) as Qty

Resident Final

Group by Pair;

Drop Table Final;

YoussefBelloum
Champion
Champion

Hi,

try this:

test:

LOAD *  

Inline [

sentence

sentence 1 : "I want to learn English"

sentence 2 : "I want to learn Spanish"

sentence 3 : "I want to learn French"

];

for each var in  'learn English','learn Spanish','learn French','I want','want to','to learn'

test2:

LOAD sentence,

if(wildmatch(sentence,'*$(var)*'),'$(var)') as lib,

if(wildmatch(sentence,'*$(var)*'),1) as num

resident test;

next var

attached app

MarcoWedel

Hi,

one solution might be

QlikCommunity_Thread_283581_Pic1.JPG

mapNonLetterToSpace:

Mapping

LOAD Chr(RecNo()), ' '

AutoGenerate 65535

Where Upper(Chr(RecNo()))=Lower(Chr(RecNo()));

mapReduceMultispace:

Mapping

LOAD Repeat(' ',100-RecNo()), ' '

AutoGenerate 98;

tabTextLines:

LOAD RowNo() as LineID,

    TextLine,

    Trim(MapSubString('mapReduceMultispace',MapSubString('mapNonLetterToSpace',TextLine))) as TextLineWordSep;

LOAD * INLINE [

    TextLine

    I want to learn English

    I want to learn Spanish

    I want to learn French

];

tabWordTuples:

LOAD *,

    Upper(WordTuple) as WORDTuple,

    AutoNumber(WordTuple,'WordTupleID') as WordTupleID,

    AutoNumber(Upper(WordTuple),'WordTupleID') as WORDTupleID,

    AutoNumber(Hash128(WordTuple,LineID,WordTupleStart),'WordTuplePosID') as WordTuplePosID;

LOAD LineID,

    WordTupleStart,

    IterNo() as WordTupleLength,

    Left(SubStrRight,Index(SubStrRight&' ',' ',IterNo())-1) as WordTuple

While IterNo() <= SubStringCount(SubStrRight,' ')+1;  

LOAD LineID,

    IterNo() as WordTupleStart,

    Mid(TextLineWordSep,Index(' '&TextLineWordSep,' ',IterNo())) as SubStrRight

Resident tabTextLines

While IterNo() <= SubStringCount(TextLineWordSep,' ')+1;

tabWords:

LOAD LineID,

    WordTupleStart as WordNo,

    AutoNumber(Hash128(LineID,WordTupleStart),'WordID') as WordID,

    WordTuple as Word,

    WORDTuple as WORD

Resident tabWordTuples

Where WordTupleLength=1

Order By LineID,WordTupleStart;

tabWordLink:

LOAD WordTuplePosID,

    AutoNumber(Hash128(LineID,WordTupleStart+IterNo()-1),'WordID') as WordID

Resident tabWordTuples

While IterNo() <= WordTupleLength;

DROP Field LineID From tabWordTuples;

(adapting a more general approach I previously created)



hope this helps


regards


Marco

MarcoWedel

Just to get an impression of what this solution could be used for:

QlikCommunity_Thread_283581_Pic2.JPG

QlikCommunity_Thread_283581_Pic3.JPG

QlikCommunity_Thread_283581_Pic4.JPG

QlikCommunity_Thread_283581_Pic6.JPG

QlikCommunity_Thread_283581_Pic7.JPG


QlikCommunity_Thread_283581_Pic8.JPG


hope this helps


regards


Marco



View solution in original post

antoniotiman
Master III
Master III

May be this

LOAD Text1,Count(Text1) as Counter group By Text1;
LOAD *,SubField(Text,' ',IterNo())&' '&SubField(Text,' ',IterNo()+1) as Text1
Inline [
Text
"I want to learn English"
"I want to learn Spanish"
"I want to learn French"
]
While Iterno() <= SubStringCount(Text
,' ');

Regards,

Antonio

MarcoWedel

please close your thread if your question is anwered:

Qlik Community Tip: Marking Replies as Correct or Helpful

thanks

regards

Marco

Hewitt_Trinh
Contributor
Contributor

Hi there - I was in a process of getting a license. Would very appreciate if you could post the full code here instead of the qvw file.

I'm still in the process of understanding your codes in this post: https://community.qlik.com/t5/QlikView-App-Development/How-to-identify-recurrent-terms-in-text-field.... It's quite intriguing that the longest word tuples in the results set is only 3 words.

Thanks