Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Count Occurrence Word From Social Media

Hi,
I just wonder and need everyone of you on this matter. I required to count the occurrence of word from social media such as blog, facebook etc. But im not sure if there's any freeware than can integrated with Talend to count the occurrences.
I don't think by creating ETL job can counting the occurrence fast and real-time.
Plz help to advice me 0683p000009MPcz.png

Regards,
Kal
Labels (3)
11 Replies
Anonymous
Not applicable
Author

Hi,
The most important thing is that you need extract the information from Facebook or Social Media by talend, first and then do the action of counting . So I think the https://community.talend.com/t5/Design-and-Development/FaceBook/td-p/99612 is useful for you.
Best regards
Sabrina
Anonymous
Not applicable
Author

Hi,
Thanks for the information, after i extract the information from social media/facebook, how do i want to counting it?
Rgds,
Kal
Anonymous
Not applicable
Author

Hi,
There is component tFileRowCount.The function is counting the number of rows in a file.
The work flow may be Source file-->tFileInputxx-->tFileRowCount-->tFileOutputxx
Best regards
Sabrina
Anonymous
Not applicable
Author

Hi,
My source file is SQL Server. How do i wants to connect to tFileRowCount? Also, i wants to count the occurrence of each word. Is that possible?
Thanks,
Kal
Anonymous
Not applicable
Author

Hi,
My source file is SQL Server. How do i wants to connect to tFileRowCount? Also, i wants to count the occurrence of each word. Is that possible?
Thanks,
Kal

Yes, you can count each word of a string, use tNormalize to normalize the data to multiple lines with the separator " ", for example, you have a data like:
"this is an example for tNormalize component"
to:
this
is
an
example
for
tNormalize
component
Then link tNormalize to tAggregateRow to for counting the number of each word with the 'count' operator.
tMSSQLlnput--main--tNormalize--main--tAggregateRow---tLogRow
Shong
Anonymous
Not applicable
Author

Hi,
I've followed your suggestion and it's worked but there's a little issue i faced where a few words are not isolated and i noticed it happened on the first word of sentence after full stop sign "."
For example:
"i like to watch movie. I like eat too"
Expected output:
-------------------
i
like
to
watch
movie
i
like
eat
too
Current output:
-----------------
i
like
to
watch
movie. I \\this is the issue
like
eat
too

Could you figure out the issue?
Anonymous
Not applicable
Author

Hi
Remove the special character such as ",", "." and so on before normalizing the string, for example:
row1.line.replaceAll(".","")
If the string may contains more types of special character, it is better to define a function to handle the special characters in a routine, define a list to add all characters that may exist in the string, then each character and remove it from the string. Then, call the routine to remove all special characters on a tMap for example before tNormalize:
tMSSQLlnput--main--tMap-main-->tNormalize--main--tAggregateRow---tLogRow

Shong
Anonymous
Not applicable
Author

Hi Shong,
Actually, I did removed special characters including ".". But it returned me like this
Current output:
-----------------
i
like
to
watch
movie I \\this is the issue
like
eat
too
Refer my job design.
0683p000009MDuK.jpg
Anonymous
Not applicable
Author

Hi
In principle, there should be a space after character in English, however there is no a space after "." in your case, in order to avoid this situation, you can always replace a character with a space, for example:
row1.line.replaceAll("\\."," ")
And then, use a tfiterRow to remove the empty lines.
Shong