Hello,
I need help to read in a variable the content of a pdf file to put it in a text field on a database.
What sort of component I'm suppose to use ?
The process :
- list the files on a folder : ok
- read the file name to find the database row : ok
- read the content of the file to put it on a database ... not ok
Does anyone have a solution ???
Thanks,
David
Hi,
Thank's for very useful information
i have written a method to read the pdf
Can you please help me how to add the method as a Routines to run the code from the talend tool
when i create a job i am able to view the code but not able to edit it to add my method.
Please give me a suggestion.
Good question. In the moment you have to use self written code in a tJavaFlex but I do not know how to read a PDF.
I would google for it. Sorry.
Ony problem is: a PDF can be created from images and the structure of the text is oriented for the layout and does not have a fix structure like a HTML table. A solution would be meanly a individual solution for a particular PDF file and every layout changes on the file will have impact to your code.
Is there any change in status of no component exist to read pdf ?
Okay, even if no component exists, is there any way to extract some particular columnar data (although no physical table structure is drawn in pdf, but virtually data is divided into columns) and store it in DB table columns ?
Through java code and itext library in routine, I am able to read pdf file but as mentioned above how to extract columns from pdf ?
Any code or url reference for this will be helpful.
Google "Java API for reading PDF files".
This is an unusual requirement (for reasons already explained above), but if there is text in the PDF that can be retrieved, the best way is to write a Java routine making use of an existing Java API. One of Talend's massive advantages over other tools is the ease at which you can write your own components or just add code to a tJavaFlex to make use of third party APIs.
Hi talend team,
We have a requirement to read the data from a PDF file/files. wanted to know like do we have any component provided by talend tool through which we can read the content from the pdf files.
I have gone through the different posts on google but maximum I found that it can be done using a piece of java code, but issue is that it is customized for a particular file and not valid unanimously for any kind of PDF file. So request you to share something on this so that I can get clear picture and decide accordingly to go ahead with talend as ETL tool for my assignment. Any sort of help would be appreciable
Thanks