
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to obtain a frequency distribution of words in a column?
I have a table with two columns: Field 1 (contains a free phrase) and Field 2 (contains the classification of the phrase - POSITIVE, NEUTRAL or NEGATIVE). By filtering, on filed 2, POSITIVE, all the positive phrases are listed.
How can I obtain a ranking of the words that appeared on these phrases, based in its frequency of appearence?


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would be nicer to have an example to work with. On first glance you may try to dump all the words from Field 1 into a separate table and do a count of them - probably the SUBFIELD-function may help you (with the two parameter-version).
HTH
Peter

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are some records for you to work with.
Field1|Field2
NEUTRAL|This afternoon, I must finish a planning case which is about Chinese brands how to get powerful. The most important one is lenovo.
POSITIVE|configuring the new Thinkpad I got for my work at the University
NEUTRAL|buy ibm thinkpad x60s battery at harold's
POSITIVE|Heard good reports of people running Linux on ThinkPad X Series. Re Apple better support for Linux on current MBPs may not be far off.
NEUTRAL|Just received the lenovo!
NEUTRAL|Installing RAM into my ThinkPad
POSITIVE|Happy that Bruno has his Lenovo. Now i don't have to worry about the safety of mine!
NEUTRAL|Troubleshooting the ThinkPad
NEGATIVE|my kids have moved on from my wife's thinkpad to the imac in my office...guarding my p'book
NEGATIVE|Feeling 08 will bring me back to the mac in a big way. iPhone, Applescript and Dual booting are just things I can't do on my Thinkpad T60p.
NEUTRAL|Lenovo in strategy shift with consumer PC move: Lenovo entered the global consumer computer market yesterday
POSITIVE|Just loaded up my new ThinkPad with goodies to kick off the new year. Yippie.
POSITIVE|oh YES! say hallo to 1440x900 Fedora 8 Thinkpad T61
NEGATIVE|In case of ThinkPad T43, this pc doesn't have Recovery Disc. If you want to recovery your pc, push button.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are some records for you to work with:
Field1|Field2
NEUTRAL|This afternoon, I must finish a planning case which is about Chinese brands how to get powerful. The most importent one is lenovo.
POSITIVE|configuring the new Thinkpad I got for my work at the University
NEUTRAL|buy ibm thinkpad x60s battery at harold's
POSITIVE|Heard good reports of people running Linux on ThinkPad X Series. Re Apple better support for Linux on current MBPs may not be far off.
NEUTRAL|Just received the lenovo!
NEUTRAL|Installing RAM into my ThinkPad
POSITIVE|Happy that Bruno has his Lenovo. Now i don't have to worry about the safety of mine!
NEUTRAL|Troubleshooting the ThinkPad
NEGATIVE|my kids have moved on from my wife's thinkpad to the imac in my office...guarding my p'book
NEGATIVE|Feeling 08 will bring me back to the mac in a big way. iPhone, Applescript and Dual booting are just things I can't do on my Thinkpad T60p.
NEUTRAL|Lenovo in strategy shift with consumer PC move: Lenovo entered the global consumer computer market yesterday
POSITIVE|Just loaded up my new ThinkPad with goodies to kick off the new year. Yippie.
POSITIVE|oh YES! say hallo to 1440x900 Fedora 8 Thinkpad T61
NEGATIVE|In case of ThinkPad T43, this pc doesn't have Recovery Disc. If you want to recovery your pc, push button.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
See attached example. It's not your data, but hopefully the idea is clear and can be easily adapted.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in your example word count for example for "A" when 4 is selected in 4 ... but the phrase only has 2!
any idea?


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
wiremaze,
I think the inner join will lead to a N*N count for each word that appears N times in one and the same phrase.
I think the inner join part is only needed to remove "blank" Words (i.e. the parts of the sentence that are returned by subfield e.g. at the start of some sentences, where two blanks follow each other.)
I think you could replace the last parts of the script with:
tmpWords:
LOAD ID, mapsubstring('TextMap',(subfield(capitalize(keepchar(upper(Text),'ABCDEFGHIJKLMNOPQRSTUVWXYZ ')),' '))) as Word
RESIDENT Data
;
Words:
//Inner join (Words)
Noconcatenate LOAD ID, Word
RESIDENT tmpWords
WHERE len(trim(Word))
;
drop table tmpWords;
Regards,
Stefan
