Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
amiroh81
Creator
Creator

Pair of words from sentence

Hi,

I have records, when every record has a sentence.

I would like to perform tests on the number of occurrences of each word in the sentence.

The test on each word individually I was able to perform using the forum help! Thank you

The next step I would like to do is display a pair of words, for example:

sentence 1 : "I want to learn English"

sentence 2 : "I want to learn Spanish"

sentence 3 : "I want to learn French"


The result I would like to receive is:

I want - 3

want to - 3

to learn - 3

learn English - 1

learn Spanish- 1

learn French - 1


Does anyone have an idea?

thanks

1 Solution

Accepted Solutions
MarcoWedel

Just to get an impression of what this solution could be used for:

QlikCommunity_Thread_283581_Pic2.JPG

QlikCommunity_Thread_283581_Pic3.JPG

QlikCommunity_Thread_283581_Pic4.JPG

QlikCommunity_Thread_283581_Pic6.JPG

QlikCommunity_Thread_283581_Pic7.JPG


QlikCommunity_Thread_283581_Pic8.JPG


hope this helps


regards


Marco



View solution in original post

8 Replies
sunny_talwar

I have seen some great thread by marcowedel‌ on this same topic... he might be able to offer his expertise here....

unstructured text analysis

Clever_Anjos
Employee
Employee

Try with this

Table:

LOAD SubField(F1,' ') as Word, RecNo() as Line, RowNo() as Position INLINE [

    F1

    I want to learn English

    I want to learn Spanish

    I want to learn French

];

Left join(Table)

LOAD

Word as Word1,

Position as Position1,

Line

Resident Table;

Final:

Load

Line,

Word & ' ' & Word1 as Pair,

RowNo() as Sequence

Resident Table

Where Position +1 = Position1;

Drop Table Table;

load

Pair,

Count(Pair) as Qty

Resident Final

Group by Pair;

Drop Table Final;

YoussefBelloum
Champion
Champion

Hi,

try this:

test:

LOAD *  

Inline [

sentence

sentence 1 : "I want to learn English"

sentence 2 : "I want to learn Spanish"

sentence 3 : "I want to learn French"

];

for each var in  'learn English','learn Spanish','learn French','I want','want to','to learn'

test2:

LOAD sentence,

if(wildmatch(sentence,'*$(var)*'),'$(var)') as lib,

if(wildmatch(sentence,'*$(var)*'),1) as num

resident test;

next var

attached app

MarcoWedel

Hi,

one solution might be

QlikCommunity_Thread_283581_Pic1.JPG

mapNonLetterToSpace:

Mapping

LOAD Chr(RecNo()), ' '

AutoGenerate 65535

Where Upper(Chr(RecNo()))=Lower(Chr(RecNo()));

mapReduceMultispace:

Mapping

LOAD Repeat(' ',100-RecNo()), ' '

AutoGenerate 98;

tabTextLines:

LOAD RowNo() as LineID,

    TextLine,

    Trim(MapSubString('mapReduceMultispace',MapSubString('mapNonLetterToSpace',TextLine))) as TextLineWordSep;

LOAD * INLINE [

    TextLine

    I want to learn English

    I want to learn Spanish

    I want to learn French

];

tabWordTuples:

LOAD *,

    Upper(WordTuple) as WORDTuple,

    AutoNumber(WordTuple,'WordTupleID') as WordTupleID,

    AutoNumber(Upper(WordTuple),'WordTupleID') as WORDTupleID,

    AutoNumber(Hash128(WordTuple,LineID,WordTupleStart),'WordTuplePosID') as WordTuplePosID;

LOAD LineID,

    WordTupleStart,

    IterNo() as WordTupleLength,

    Left(SubStrRight,Index(SubStrRight&' ',' ',IterNo())-1) as WordTuple

While IterNo() <= SubStringCount(SubStrRight,' ')+1;  

LOAD LineID,

    IterNo() as WordTupleStart,

    Mid(TextLineWordSep,Index(' '&TextLineWordSep,' ',IterNo())) as SubStrRight

Resident tabTextLines

While IterNo() <= SubStringCount(TextLineWordSep,' ')+1;

tabWords:

LOAD LineID,

    WordTupleStart as WordNo,

    AutoNumber(Hash128(LineID,WordTupleStart),'WordID') as WordID,

    WordTuple as Word,

    WORDTuple as WORD

Resident tabWordTuples

Where WordTupleLength=1

Order By LineID,WordTupleStart;

tabWordLink:

LOAD WordTuplePosID,

    AutoNumber(Hash128(LineID,WordTupleStart+IterNo()-1),'WordID') as WordID

Resident tabWordTuples

While IterNo() <= WordTupleLength;

DROP Field LineID From tabWordTuples;

(adapting a more general approach I previously created)



hope this helps


regards


Marco

MarcoWedel

Just to get an impression of what this solution could be used for:

QlikCommunity_Thread_283581_Pic2.JPG

QlikCommunity_Thread_283581_Pic3.JPG

QlikCommunity_Thread_283581_Pic4.JPG

QlikCommunity_Thread_283581_Pic6.JPG

QlikCommunity_Thread_283581_Pic7.JPG


QlikCommunity_Thread_283581_Pic8.JPG


hope this helps


regards


Marco



antoniotiman
Master III
Master III

May be this

LOAD Text1,Count(Text1) as Counter group By Text1;
LOAD *,SubField(Text,' ',IterNo())&' '&SubField(Text,' ',IterNo()+1) as Text1
Inline [
Text
"I want to learn English"
"I want to learn Spanish"
"I want to learn French"
]
While Iterno() <= SubStringCount(Text
,' ');

Regards,

Antonio

MarcoWedel

please close your thread if your question is anwered:

Qlik Community Tip: Marking Replies as Correct or Helpful

thanks

regards

Marco

Hewitt_Trinh
Contributor
Contributor

Hi there - I was in a process of getting a license. Would very appreciate if you could post the full code here instead of the qvw file.

I'm still in the process of understanding your codes in this post: https://community.qlik.com/t5/QlikView-App-Development/How-to-identify-recurrent-terms-in-text-field.... It's quite intriguing that the longest word tuples in the results set is only 3 words.

Thanks