<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to read pdf file in talend? in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258558#M1292</link>
    <description>&lt;P&gt;Hi all,&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I saw the other topic posted. Unfortunately the solution does not fit my needs. I have a pdf file that contains information regarding payslips of employees, so each page a different employee.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Is there a way I could extract info from each page to export a database with all the employees? What I could do so far is to import the pdf file and get 1 huge line of characters from the file. And there's not much I can do with it.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;There is an example pdf file attached, having 2 employees' info (2 pages).&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thank you very much,&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Anca&lt;/P&gt;</description>
    <pubDate>Sat, 16 Nov 2024 07:45:47 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-11-16T07:45:47Z</dc:date>
    <item>
      <title>How to read pdf file in talend?</title>
      <link>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258558#M1292</link>
      <description>&lt;P&gt;Hi all,&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I saw the other topic posted. Unfortunately the solution does not fit my needs. I have a pdf file that contains information regarding payslips of employees, so each page a different employee.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Is there a way I could extract info from each page to export a database with all the employees? What I could do so far is to import the pdf file and get 1 huge line of characters from the file. And there's not much I can do with it.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;There is an example pdf file attached, having 2 employees' info (2 pages).&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thank you very much,&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Anca&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 07:45:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258558#M1292</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T07:45:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend?</title>
      <link>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258559#M1293</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Could you please refer below link which is using custom code to process a PDF file.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCp9oCAC" target="_blank"&gt;https://community.talend.com/t5/Design-and-Development/How-to-read-pdf-file-in-talend/m-p/99998&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp; &amp;nbsp; The recommendation will be not to use the PDF file for processing important data like payslips as the data will not be in a format directly expected by target database. It will be a good idea to go to the corresponding source system and create a new data flow from that source to your new target database.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp; &amp;nbsp; If the data is provided by third party, then you will have to create an agreed interface in the form of a file or web request payload to send the data so that it can be processed by Talend easily.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Warm Regards,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Nikhil Thampi&lt;/P&gt;</description>
      <pubDate>Wed, 29 Aug 2018 20:29:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258559#M1293</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-08-29T20:29:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend?</title>
      <link>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258560#M1294</link>
      <description>&lt;P&gt;Thanks for this thread. I also had this same question. Thanks for the cutom way around. Helped a lot.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Aug 2018 09:15:06 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258560#M1294</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-08-31T09:15:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend?</title>
      <link>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258561#M1295</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;Nikhil,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I found a way to read the PDF and export it in txt using Talend. Now I process the txt file in Java(eclipse) and right now I am struggling to implement the script in Talend.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I have two java classes written, one with the getters/setters and the main one that reads the pdf and exports it in Excel.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Do you know what components and how should I set them up in Talend?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you for your support!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Anca&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Dec 2018 10:07:07 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258561#M1295</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-12-06T10:07:07Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend?</title>
      <link>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258562#M1296</link>
      <description>&lt;P&gt;&lt;A href="https://community.qlik.com/s/profile/0053p000007LLAMAA4"&gt;@ancamaracu&lt;/A&gt;,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp; &amp;nbsp; Below link can help you to understand how to create a Talend user routine to perform custom java codes in repeatable fashion.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;A href="https://help.talend.com/reader/BnnM0hh3643D9Vq15udPtA/ptUkG0B_wbN2IR4iYnXJ3g" target="_blank" rel="nofollow noopener noreferrer"&gt;https://help.talend.com/reader/BnnM0hh3643D9Vq15udPtA/ptUkG0B_wbN2IR4iYnXJ3g&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp; &amp;nbsp;Hope this has helped you to resolve the query. Could you please post the method you have employed to convert the PDF file so that it will benefit other Talend community members?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Warm Regards,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Nikhil Thampi&lt;/P&gt;</description>
      <pubDate>Fri, 07 Dec 2018 06:53:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/How-to-read-pdf-file-in-talend/m-p/2258562#M1296</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-12-07T06:53:50Z</dc:date>
    </item>
  </channel>
</rss>

