Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I was able to read the text of PDFs using the Apache library pdfbox, pdfbox-app-2.0.25.jar
I used the tLibraryLoad component to load the jar.
Then used a tJava component to read the file
tJava Code:
/*
File file = new File("/opt/sample.pdf");
PDDocument document = PDDocument.load(file);
PDFTextStripper pdfStripper = new PDFTextStripper();
String text = pdfStripper.getText(document);
System.out.println("Text:" + text);
document.close();
*/
PDDocument document = PDDocument.load(new File("/opt/pdf.pdf"));
if (!document.isEncrypted()) {
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);
System.out.println("Text:" + text);
}
document.close();
tJava Advanced Settings:
import java.io.File;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;
Hello,
Thanks for sharing this solution with us on community.
Best regards
Sabrina
I have used exact steps but unable to get it going
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.pdfbox.pdmodel.PDDocument.<clinit>(PDDocument.java:98)
at local_project.test_0_1.test.tJava_1Process(test.java:501)
at local_project.test_0_1.test.tLibraryLoad_1Process(test.java:415)
at local_project.test_0_1.test.runJobInTOS(test.java:804)
at local_project.test_0_1.test.main(test.java:642)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 5 more