Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a table with two columns: Field 1 (contains a free phrase) and Field 2 (contains the classification of the phrase - POSITIVE, NEUTRAL or NEGATIVE). By filtering, on filed 2, POSITIVE, all the positive phrases are listed.
How can I obtain a ranking of the words that appeared on these phrases, based in its frequency of appearence?
Would be nicer to have an example to work with. On first glance you may try to dump all the words from Field 1 into a separate table and do a count of them - probably the SUBFIELD-function may help you (with the two parameter-version).
HTH
Peter
Here are some records for you to work with.
Field1|Field2
NEUTRAL|This afternoon, I must finish a planning case which is about Chinese brands how to get powerful. The most important one is lenovo.
POSITIVE|configuring the new Thinkpad I got for my work at the University
NEUTRAL|buy ibm thinkpad x60s battery at harold's
POSITIVE|Heard good reports of people running Linux on ThinkPad X Series. Re Apple better support for Linux on current MBPs may not be far off.
NEUTRAL|Just received the lenovo!
NEUTRAL|Installing RAM into my ThinkPad
POSITIVE|Happy that Bruno has his Lenovo. Now i don't have to worry about the safety of mine!
NEUTRAL|Troubleshooting the ThinkPad
NEGATIVE|my kids have moved on from my wife's thinkpad to the imac in my office...guarding my p'book
NEGATIVE|Feeling 08 will bring me back to the mac in a big way. iPhone, Applescript and Dual booting are just things I can't do on my Thinkpad T60p.
NEUTRAL|Lenovo in strategy shift with consumer PC move: Lenovo entered the global consumer computer market yesterday
POSITIVE|Just loaded up my new ThinkPad with goodies to kick off the new year. Yippie.
POSITIVE|oh YES! say hallo to 1440x900 Fedora 8 Thinkpad T61
NEGATIVE|In case of ThinkPad T43, this pc doesn't have Recovery Disc. If you want to recovery your pc, push button.
Here are some records for you to work with:
Field1|Field2
NEUTRAL|This afternoon, I must finish a planning case which is about Chinese brands how to get powerful. The most importent one is lenovo.
POSITIVE|configuring the new Thinkpad I got for my work at the University
NEUTRAL|buy ibm thinkpad x60s battery at harold's
POSITIVE|Heard good reports of people running Linux on ThinkPad X Series. Re Apple better support for Linux on current MBPs may not be far off.
NEUTRAL|Just received the lenovo!
NEUTRAL|Installing RAM into my ThinkPad
POSITIVE|Happy that Bruno has his Lenovo. Now i don't have to worry about the safety of mine!
NEUTRAL|Troubleshooting the ThinkPad
NEGATIVE|my kids have moved on from my wife's thinkpad to the imac in my office...guarding my p'book
NEGATIVE|Feeling 08 will bring me back to the mac in a big way. iPhone, Applescript and Dual booting are just things I can't do on my Thinkpad T60p.
NEUTRAL|Lenovo in strategy shift with consumer PC move: Lenovo entered the global consumer computer market yesterday
POSITIVE|Just loaded up my new ThinkPad with goodies to kick off the new year. Yippie.
POSITIVE|oh YES! say hallo to 1440x900 Fedora 8 Thinkpad T61
NEGATIVE|In case of ThinkPad T43, this pc doesn't have Recovery Disc. If you want to recovery your pc, push button.
See attached example. It's not your data, but hopefully the idea is clear and can be easily adapted.
Thanks!
in your example word count for example for "A" when 4 is selected in 4 ... but the phrase only has 2!
any idea?
wiremaze,
I think the inner join will lead to a N*N count for each word that appears N times in one and the same phrase.
I think the inner join part is only needed to remove "blank" Words (i.e. the parts of the sentence that are returned by subfield e.g. at the start of some sentences, where two blanks follow each other.)
I think you could replace the last parts of the script with:
tmpWords:
LOAD ID, mapsubstring('TextMap',(subfield(capitalize(keepchar(upper(Text),'ABCDEFGHIJKLMNOPQRSTUVWXYZ ')),' '))) as Word
RESIDENT Data
;
Words:
//Inner join (Words)
Noconcatenate LOAD ID, Word
RESIDENT tmpWords
WHERE len(trim(Word))
;
drop table tmpWords;
Regards,
Stefan