<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: [resolved] how to read PDF input file from talend in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301717#M73787</link>
    <description>&lt;P&gt;I have used exact steps but unable to get it going&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory&lt;/P&gt;&lt;P&gt;	at org.apache.pdfbox.pdmodel.PDDocument.&amp;lt;clinit&amp;gt;(PDDocument.java:98)&lt;/P&gt;&lt;P&gt;	at local_project.test_0_1.test.tJava_1Process(test.java:501)&lt;/P&gt;&lt;P&gt;	at local_project.test_0_1.test.tLibraryLoad_1Process(test.java:415)&lt;/P&gt;&lt;P&gt;	at local_project.test_0_1.test.runJobInTOS(test.java:804)&lt;/P&gt;&lt;P&gt;	at local_project.test_0_1.test.main(test.java:642)&lt;/P&gt;&lt;P&gt;Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory&lt;/P&gt;&lt;P&gt;	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)&lt;/P&gt;&lt;P&gt;	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)&lt;/P&gt;&lt;P&gt;	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)&lt;/P&gt;&lt;P&gt;	... 5 more&lt;/P&gt;</description>
    <pubDate>Wed, 08 Jun 2022 12:51:15 GMT</pubDate>
    <dc:creator>SNad1654691194</dc:creator>
    <dc:date>2022-06-08T12:51:15Z</dc:date>
    <item>
      <title>[resolved] how to read PDF input file from talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301713#M73783</link>
      <description>Hi,
&lt;BR /&gt;Is it possible to read PDF file through talend. We have to read this file and load data into a target table.
&lt;BR /&gt;Can you please suggest.
&lt;BR /&gt;Regards
&lt;BR /&gt;Govardhan Turaka</description>
      <pubDate>Wed, 24 Sep 2014 16:04:51 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301713#M73783</guid>
      <dc:creator>govardhant85</dc:creator>
      <dc:date>2014-09-24T16:04:51Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] how to read PDF input file from talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301714#M73784</link>
      <description>Hi Govardhan Turaka, 
&lt;BR /&gt;Please have a look at a related forum : 
&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCp9oCAC" target="_blank" rel="nofollow noopener noreferrer"&gt;https://community.talend.com/t5/Design-and-Development/How-to-read-pdf-file-in-talend/td-p/99998&lt;/A&gt; 
&lt;BR /&gt;Best regards 
&lt;BR /&gt;Sabrina</description>
      <pubDate>Thu, 25 Sep 2014 03:35:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301714#M73784</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T03:35:53Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] how to read PDF input file from talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301715#M73785</link>
      <description>&lt;P&gt;I was able to read the text of PDFs using the Apache library pdfbox, pdfbox-app-2.0.25.jar&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I used the tLibraryLoad component to load the jar.&lt;/P&gt;&lt;P&gt;Then used a tJava component to read the file&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;tJava Code:&lt;/B&gt;&lt;/P&gt;&lt;P&gt;/*&lt;/P&gt;&lt;P&gt;File file = new File("/opt/sample.pdf");&lt;/P&gt;&lt;P&gt;PDDocument document = PDDocument.load(file);&lt;/P&gt;&lt;P&gt;PDFTextStripper pdfStripper = new PDFTextStripper();&lt;/P&gt;&lt;P&gt;String text = pdfStripper.getText(document);&lt;/P&gt;&lt;P&gt;System.out.println("Text:" + text);&lt;/P&gt;&lt;P&gt;document.close();&lt;/P&gt;&lt;P&gt;*/&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PDDocument document = PDDocument.load(new File("/opt/pdf.pdf"));&lt;/P&gt;&lt;P&gt;if (!document.isEncrypted()) {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;PDFTextStripper stripper = new PDFTextStripper();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;String text = stripper.getText(document);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;System.out.println("Text:" + text);&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;document.close();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;tJava Advanced Settings:&lt;/B&gt;&lt;/P&gt;&lt;P&gt;import java.io.File;&lt;/P&gt;&lt;P&gt;import org.apache.pdfbox.pdmodel.PDDocument;&amp;nbsp;&lt;/P&gt;&lt;P&gt;import org.apache.pdfbox.text.PDFTextStripper;&amp;nbsp;&lt;/P&gt;&lt;P&gt;import org.apache.pdfbox.text.PDFTextStripperByArea;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0695b00000N1DeXAAV.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/145495i7D15B0CDF6E3166F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0695b00000N1DeXAAV.png" alt="0695b00000N1DeXAAV.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0695b00000N1Df6AAF.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154318i3C90119C15BD8B4A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0695b00000N1Df6AAF.png" alt="0695b00000N1Df6AAF.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0695b00000N1DdoAAF.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/134431i441522643947F331/image-size/large?v=v2&amp;amp;px=999" role="button" title="0695b00000N1DdoAAF.png" alt="0695b00000N1DdoAAF.png" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jan 2022 18:56:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301715#M73785</guid>
      <dc:creator>tomwattsusa</dc:creator>
      <dc:date>2022-01-11T18:56:23Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] how to read PDF input file from talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301716#M73786</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Thanks for sharing this solution with us on community.&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;Sabrina&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jan 2022 03:48:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301716#M73786</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-01-12T03:48:13Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] how to read PDF input file from talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301717#M73787</link>
      <description>&lt;P&gt;I have used exact steps but unable to get it going&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory&lt;/P&gt;&lt;P&gt;	at org.apache.pdfbox.pdmodel.PDDocument.&amp;lt;clinit&amp;gt;(PDDocument.java:98)&lt;/P&gt;&lt;P&gt;	at local_project.test_0_1.test.tJava_1Process(test.java:501)&lt;/P&gt;&lt;P&gt;	at local_project.test_0_1.test.tLibraryLoad_1Process(test.java:415)&lt;/P&gt;&lt;P&gt;	at local_project.test_0_1.test.runJobInTOS(test.java:804)&lt;/P&gt;&lt;P&gt;	at local_project.test_0_1.test.main(test.java:642)&lt;/P&gt;&lt;P&gt;Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory&lt;/P&gt;&lt;P&gt;	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)&lt;/P&gt;&lt;P&gt;	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)&lt;/P&gt;&lt;P&gt;	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)&lt;/P&gt;&lt;P&gt;	... 5 more&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 12:51:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-how-to-read-PDF-input-file-from-talend/m-p/2301717#M73787</guid>
      <dc:creator>SNad1654691194</dc:creator>
      <dc:date>2022-06-08T12:51:15Z</dc:date>
    </item>
  </channel>
</rss>

