Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
paulo_costa_ifp
Contributor II
Contributor II

How to obtain a frequency distribution of words in a column?

I have a table with two columns: Field 1 (contains a free phrase) and Field 2 (contains the classification of the phrase - POSITIVE, NEUTRAL or NEGATIVE). By filtering, on filed 2, POSITIVE, all the positive phrases are listed.

How can I obtain a ranking of the words that appeared on these phrases, based in its frequency of appearence?

7 Replies
prieper
Master II
Master II

Would be nicer to have an example to work with. On first glance you may try to dump all the words from Field 1 into a separate table and do a count of them - probably the SUBFIELD-function may help you (with the two parameter-version).

HTH
Peter

paulo_costa_ifp
Contributor II
Contributor II
Author

Here are some records for you to work with.

Field1|Field2

NEUTRAL|This afternoon, I must finish a planning case which is about Chinese brands how to get powerful. The most important one is lenovo.

POSITIVE|configuring the new Thinkpad I got for my work at the University

NEUTRAL|buy ibm thinkpad x60s battery at harold's

POSITIVE|Heard good reports of people running Linux on ThinkPad X Series. Re Apple better support for Linux on current MBPs may not be far off.

NEUTRAL|Just received the lenovo!

NEUTRAL|Installing RAM into my ThinkPad

POSITIVE|Happy that Bruno has his Lenovo. Now i don't have to worry about the safety of mine!

NEUTRAL|Troubleshooting the ThinkPad

NEGATIVE|my kids have moved on from my wife's thinkpad to the imac in my office...guarding my p'book

NEGATIVE|Feeling 08 will bring me back to the mac in a big way. iPhone, Applescript and Dual booting are just things I can't do on my Thinkpad T60p.

NEUTRAL|Lenovo in strategy shift with consumer PC move: Lenovo entered the global consumer computer market yesterday

POSITIVE|Just loaded up my new ThinkPad with goodies to kick off the new year. Yippie.

POSITIVE|oh YES! say hallo to 1440x900 Fedora 8 Thinkpad T61

NEGATIVE|In case of ThinkPad T43, this pc doesn't have Recovery Disc. If you want to recovery your pc, push button.



paulo_costa_ifp
Contributor II
Contributor II
Author

Here are some records for you to work with:

Field1|Field2

NEUTRAL|This afternoon, I must finish a planning case which is about Chinese brands how to get powerful. The most importent one is lenovo.

POSITIVE|configuring the new Thinkpad I got for my work at the University

NEUTRAL|buy ibm thinkpad x60s battery at harold's

POSITIVE|Heard good reports of people running Linux on ThinkPad X Series. Re Apple better support for Linux on current MBPs may not be far off.

NEUTRAL|Just received the lenovo!

NEUTRAL|Installing RAM into my ThinkPad

POSITIVE|Happy that Bruno has his Lenovo. Now i don't have to worry about the safety of mine!

NEUTRAL|Troubleshooting the ThinkPad

NEGATIVE|my kids have moved on from my wife's thinkpad to the imac in my office...guarding my p'book

NEGATIVE|Feeling 08 will bring me back to the mac in a big way. iPhone, Applescript and Dual booting are just things I can't do on my Thinkpad T60p.

NEUTRAL|Lenovo in strategy shift with consumer PC move: Lenovo entered the global consumer computer market yesterday

POSITIVE|Just loaded up my new ThinkPad with goodies to kick off the new year. Yippie.

POSITIVE|oh YES! say hallo to 1440x900 Fedora 8 Thinkpad T61

NEGATIVE|In case of ThinkPad T43, this pc doesn't have Recovery Disc. If you want to recovery your pc, push button.



johnw
Champion III
Champion III

See attached example. It's not your data, but hopefully the idea is clear and can be easily adapted.

paulo_costa_ifp
Contributor II
Contributor II
Author

Thanks!

Not applicable

in your example word count for example for "A" when 4 is selected in 4 ... but the phrase only has 2!

any idea?

swuehl
MVP
MVP

wiremaze,

I think the inner join will lead to a N*N count for each word that appears N times in one and the same phrase.

I think the inner join part is only needed to remove "blank" Words (i.e. the parts of the sentence that are returned by subfield e.g. at the start of some sentences, where two blanks follow each other.)

I think you could replace the last parts of the script with:

tmpWords:

LOAD ID, mapsubstring('TextMap',(subfield(capitalize(keepchar(upper(Text),'ABCDEFGHIJKLMNOPQRSTUVWXYZ ')),' '))) as Word

RESIDENT Data

;

Words:

//Inner join (Words)

Noconcatenate LOAD ID, Word

RESIDENT tmpWords

WHERE len(trim(Word))

;

drop table tmpWords;

Regards,

Stefan