Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a requirement to read one table the data from a PDF file/files. wanted to know like do we have any component provided by talend tool through which we can read the content from the pdf files.
I have gone through the different posts on google but maximum I found that it can be done using a piece of java code, but issue is that it is customized for a particular file and not valid unanimously for any kind of PDF file.
I attach an example of my pdf, but I have a lot of pdf that I download from the following site https://www.cert.ssi.gouv.fr/ , can someone help me how I can extrapolate the table of each pdf and then I integrate it into a file
thanks
regards
Hello,
Unfortunately, there is no a component can be used to extract data from a PDF file in talend.
You could create a custom routine( hard code) to read it by yourself.
Best regards
Sabrina
Hi @xdshi ,
I am trying to use routine in Talend OS to read pdf and store the data in excel.
Actually, my PDF has a format like below:
-- page 1
<some text..>
< table>
-- page 2
< table>
<some text..>
I am struggling to read this pdf and save the table in excel for further use.
Thanks in advance.