Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to read pdf file in talend

Hello,
I need help to read in a variable the content of a pdf file to put it in a text field on a database.
What sort of component I'm suppose to use ?
The process :
- list the files on a folder : ok
- read the file name to find the database row : ok
- read the content of the file to put it on a database ... not ok 0683p000009MPcz.png
Does anyone have a solution ???
Thanks,
David
Labels (2)
21 Replies
Anonymous
Not applicable
Author

Ciao, thanks for sharing this.

 

But is not clear how i can specify which is the pdf file that must be ridden inside the script.

 

Can you clarify?

 

Thanks

tomwattsusa
Contributor
Contributor

Loaded the Apache pdfbox jar pdfbox-app-2.0.25.jar into a tlibraryLoad component. Then used a tJava component to read a PDF file

 

tJava Code:

/*

File file = new File("/opt/sample.pdf");

PDDocument document = PDDocument.load(file);

PDFTextStripper pdfStripper = new PDFTextStripper();

String text = pdfStripper.getText(document);

System.out.println("Text:" + text);

document.close();

*/

 

 

PDDocument document = PDDocument.load(new File("/opt/pdf.pdf"));

if (!document.isEncrypted()) {

  PDFTextStripper stripper = new PDFTextStripper();

  String text = stripper.getText(document);

  System.out.println("Text:" + text);

}

document.close();

 

tJava Advanced Settings:

 

import java.io.File;

import org.apache.pdfbox.pdmodel.PDDocument; 

import org.apache.pdfbox.text.PDFTextStripper; 

import org.apache.pdfbox.text.PDFTextStripperByArea;

 

0695b00000N1E1qAAF.png0695b00000N1E2FAAV.png0695b00000N1E2PAAV.png0695b00000N1E2tAAF.png0695b00000N1E2tAAF.png