Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

extarct table from pdf

I have a requirement to read one table the data from a PDF file/files. wanted to know like do we have any component provided by talend tool through which we can read the content from the pdf files.
I have gone through the different posts on google but maximum I found that it can be done using a piece of java code, but issue is that it is customized for a particular file and not valid unanimously for any kind of PDF file.


I attach an example of my pdf, but I have a lot of pdf that I download from the following site https://www.cert.ssi.gouv.fr/ , can someone help me how I can extrapolate the table of each pdf and then I integrate it into a file

 

thanks 

regards

Labels (2)
2 Replies
Anonymous
Not applicable
Author

Hello,

Unfortunately, there is no a component can be used to extract data from a PDF file in talend.

You could create a custom routine( hard code) to read it by yourself.

Best regards

Sabrina

Anonymous
Not applicable
Author

Hi @xdshi ,

 

I am trying to use routine in Talend OS to read pdf and store the data in excel.

Actually, my PDF has a format like below:

-- page 1

<some text..>

< table>

-- page 2

< table>

<some text..>

 

I am struggling to read this pdf and save the table in excel for further use.

Thanks in advance.