Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
KarthikGs
Creator
Creator

read pdf file or word document

hi all,

 i have a problem statement to read a pdf file's content(which is text not images) and extract the text with bold letters. Since there are no PDF related components, i tried converting the pdf to word document prior reading with talend. I tried reading the word doc with tfileinput(fullrow/delimited) but of no luck.

How can i read the data in any of the formats? 

Any help is appreciated.

 

Thanks in advance.

Labels (2)
2 Replies
Anonymous
Not applicable

Hello,

Here is a custom component written  by talend community user and shared on talend exchange portal.

https://exchange.talend.com/#marketplaceproductoverview:marketplace=marketplace%252F1&p=marketplace%...

If you want to install a custom component into studio, this online document TalendHelpCenter:How to install and update a custom component will help.

Best regards

Sabrina

KarthikGs
Creator
Creator
Author

hi,

thanks for your reply, i understood the component tpdftoText is capable of converting a PDF in to a text file. But my requirement is to read the PDF or word(.docx) file and to apply transformations while reading it.

can i read the PDF file as it is and extract the required string from it by applying filters?