Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
KarthikGs
Creator
Creator

read pdf file or word document

hi all,

 i have a problem statement to read a pdf file's content(which is text not images) and extract the text with bold letters. Since there are no PDF related components, i tried converting the pdf to word document prior reading with talend. I tried reading the word doc with tfileinput(fullrow/delimited) but of no luck.

How can i read the data in any of the formats? 

Any help is appreciated.

 

Thanks in advance.

Labels (2)
2 Replies
Anonymous
Not applicable

Hello,

Here is a custom component written  by talend community user and shared on talend exchange portal.

https://exchange.talend.com/#marketplaceproductoverview:marketplace=marketplace%252F1&p=marketplace%...

If you want to install a custom component into studio, this online document TalendHelpCenter:How to install and update a custom component will help.

Best regards

Sabrina

KarthikGs
Creator
Creator
Author

hi,

thanks for your reply, i understood the component tpdftoText is capable of converting a PDF in to a text file. But my requirement is to read the PDF or word(.docx) file and to apply transformations while reading it.

can i read the PDF file as it is and extract the required string from it by applying filters?