<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extracting data from PDF and Scanned images through Talend in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Extracting-data-from-PDF-and-Scanned-images-through-Talend/m-p/2352127#M118565</link>
    <description>&lt;P&gt;You are going to have to go to third party Java APIs for this. That is a major advantage of Talend, in that you can use third party APIs. You will need to be able to write Java to achieve this. Take a look here as a start (&lt;A href="https://tika.apache.org/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://tika.apache.org/&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You may find a component in the Talend exchange for the Word data, but I don't think there will be a Talend component for getting data from scanned images&lt;/P&gt;</description>
    <pubDate>Fri, 22 Sep 2017 10:29:30 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2017-09-22T10:29:30Z</dc:date>
    <item>
      <title>Extracting data from PDF and Scanned images through Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-data-from-PDF-and-Scanned-images-through-Talend/m-p/2352126#M118564</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a requirement to extract data from PDF, word and scanned images through Talend. Could anyone please suggest what can be the best component to use for the same.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am using Talend Big data platform version 6.3&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks in Advance!!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Pragya&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 09:15:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-data-from-PDF-and-Scanned-images-through-Talend/m-p/2352126#M118564</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T09:15:53Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting data from PDF and Scanned images through Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-data-from-PDF-and-Scanned-images-through-Talend/m-p/2352127#M118565</link>
      <description>&lt;P&gt;You are going to have to go to third party Java APIs for this. That is a major advantage of Talend, in that you can use third party APIs. You will need to be able to write Java to achieve this. Take a look here as a start (&lt;A href="https://tika.apache.org/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://tika.apache.org/&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You may find a component in the Talend exchange for the Word data, but I don't think there will be a Talend component for getting data from scanned images&lt;/P&gt;</description>
      <pubDate>Fri, 22 Sep 2017 10:29:30 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-data-from-PDF-and-Scanned-images-through-Talend/m-p/2352127#M118565</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-09-22T10:29:30Z</dc:date>
    </item>
  </channel>
</rss>

