Hello,
A widely popular format for storing information is pdf. Is there any connector that can be used to read the content of pdf file in Talend?
Regards,
SAmil
pdf's are the nightmare data source for all ETL tools. Unfortunately Talend is not the exception.
Often, a PDF is represented as a single image. This means that to retrieve any information from the "text" of the PDF, you would have to implement OCR routines. This is not a small task and getting all of the data from a PDF correctly is a big risk of this design.
if you have thousands of PDF's that must be entered to the DB it *might* be worth it to implement OCR and integrate this into a Talend job. My advice is to try very hard to get your data in a machine readable format, and understand what you're getting into if you agree to parse PDF files.