Count Occurrence Word From Social Media

Anonymous · ‎2013-03-07

Hi,
I just wonder and need everyone of you on this matter. I required to count the occurrence of word from social media such as blog, facebook etc. But im not sure if there's any freeware than can integrated with Talend to count the occurrences.
I don't think by creating ETL job can counting the occurrence fast and real-time.
Plz help to advice me

Regards,
Kal

Anonymous · ‎2013-03-07

Hi,
The most important thing is that you need extract the information from Facebook or Social Media by talend, first and then do the action of counting . So I think the https://community.talend.com/t5/Design-and-Development/FaceBook/td-p/99612 is useful for you.
Best regards
Sabrina

Anonymous · ‎2013-03-07

Hi,
Thanks for the information, after i extract the information from social media/facebook, how do i want to counting it?
Rgds,
Kal

Anonymous · ‎2013-03-07

Hi,
There is component tFileRowCount.The function is counting the number of rows in a file.
The work flow may be Source file-->tFileInputxx-->tFileRowCount-->tFileOutputxx
Best regards
Sabrina

Anonymous · ‎2013-03-07

Hi,
My source file is SQL Server. How do i wants to connect to tFileRowCount? Also, i wants to count the occurrence of each word. Is that possible?
Thanks,
Kal

Anonymous · ‎2013-03-07

Hi,
My source file is SQL Server. How do i wants to connect to tFileRowCount? Also, i wants to count the occurrence of each word. Is that possible?
Thanks,
Kal

Yes, you can count each word of a string, use tNormalize to normalize the data to multiple lines with the separator " ", for example, you have a data like:
"this is an example for tNormalize component"
to:
this
is
an
example
for
tNormalize
component
Then link tNormalize to tAggregateRow to for counting the number of each word with the 'count' operator.
tMSSQLlnput--main--tNormalize--main--tAggregateRow---tLogRow
Shong

Anonymous · ‎2013-03-14

Hi,
I've followed your suggestion and it's worked but there's a little issue i faced where a few words are not isolated and i noticed it happened on the first word of sentence after full stop sign "."
For example:
"i like to watch movie. I like eat too"
Expected output:
-------------------
i
like
to
watch
movie
i
like
eat
too
Current output:
-----------------
i
like
to
watch
movie. I \\this is the issue
like
eat
too

Could you figure out the issue?

Anonymous · ‎2013-03-14

Hi
Remove the special character such as ",", "." and so on before normalizing the string, for example:
row1.line.replaceAll(".","")
If the string may contains more types of special character, it is better to define a function to handle the special characters in a routine, define a list to add all characters that may exist in the string, then each character and remove it from the string. Then, call the routine to remove all special characters on a tMap for example before tNormalize:
tMSSQLlnput--main--tMap-main-->tNormalize--main--tAggregateRow---tLogRow

Shong

Anonymous · ‎2013-03-14

Hi Shong,
Actually, I did removed special characters including ".". But it returned me like this
Current output:
-----------------
i
like
to
watch
movie I \\this is the issue
like
eat
too
Refer my job design.

Anonymous · ‎2013-03-14

Hi
In principle, there should be a space after character in English, however there is no a space after "." in your case, in order to avoid this situation, you can always replace a character with a space, for example:
row1.line.replaceAll("\\."," ")
And then, use a tfiterRow to remove the empty lines.
Shong

real-time

Talend Data Integration

v5.x