<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to read pdf file in talend in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305866#M77482</link>
    <description>&lt;P&gt;Ciao, thanks for sharing this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But is not clear how i can specify which is the pdf file that must be ridden inside the script.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you clarify?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Wed, 05 Jun 2019 16:20:40 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2019-06-05T16:20:40Z</dc:date>
    <item>
      <title>How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305846#M77462</link>
      <description>Hello, 
&lt;BR /&gt;I need help to read in a variable the content of a pdf file to put it in a text field on a database. 
&lt;BR /&gt;What sort of component I'm suppose to use ? 
&lt;BR /&gt;The process : 
&lt;BR /&gt;- list the files on a folder : ok 
&lt;BR /&gt;- read the file name to find the database row : ok 
&lt;BR /&gt;- read the content of the file to put it on a database ... not ok 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MPcz.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/157233iD1A564EF62DE3BC2/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MPcz.png" alt="0683p000009MPcz.png" /&gt;&lt;/span&gt; 
&lt;BR /&gt;Does anyone have a solution ??? 
&lt;BR /&gt;Thanks, 
&lt;BR /&gt;David</description>
      <pubDate>Sat, 16 Nov 2024 13:57:18 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305846#M77462</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T13:57:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305847#M77463</link>
      <description>Hello David 
&lt;BR /&gt;Unfortunately, there is no a component can be used to extract data from a PDF file. 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MPcz.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/157233iD1A564EF62DE3BC2/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MPcz.png" alt="0683p000009MPcz.png" /&gt;&lt;/span&gt; 
&lt;BR /&gt;Best regards 
&lt;BR /&gt; 
&lt;BR /&gt; shong</description>
      <pubDate>Fri, 15 May 2009 08:20:31 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305847#M77463</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-15T08:20:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305848#M77464</link>
      <description>ok I find a solution : using a TJava after a TFileExist with this code 
&lt;BR /&gt;String chaine = new String() ; 
&lt;BR /&gt;InputStream ips=new FileInputStream(((String)globalMap.get("tFileExist_2_FILENAME"))); 
&lt;BR /&gt;InputStreamReader ipsr=new InputStreamReader(ips); 
&lt;BR /&gt;BufferedReader br=new BufferedReader(ipsr); 
&lt;BR /&gt;String ligne; 
&lt;BR /&gt;while ((ligne=br.readLine())!=null){ 
&lt;BR /&gt; chaine+=ligne+"\n"; 
&lt;BR /&gt;} 
&lt;BR /&gt;br.close(); 
&lt;BR /&gt;In the next object, use the chaine variable of the TJava object.</description>
      <pubDate>Fri, 15 May 2009 08:32:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305848#M77464</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-15T08:32:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305849#M77465</link>
      <description>I finally prefere another solution : 
&lt;BR /&gt;create a routines (in java) with a function readFile 
&lt;BR /&gt;in the tmap before data insertion, use routines.classname.functionname(pdffilenametoread)</description>
      <pubDate>Fri, 15 May 2009 09:11:14 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305849#M77465</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-15T09:11:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305850#M77466</link>
      <description>Hello friend
&lt;BR /&gt;Can you share your job and routine on forum?
&lt;BR /&gt;Thanks for your support!
&lt;BR /&gt;Best regards
&lt;BR /&gt; 
&lt;BR /&gt; shong</description>
      <pubDate>Fri, 15 May 2009 09:27:57 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305850#M77466</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-15T09:27:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305851#M77467</link>
      <description>I can't share the project because it's for my company, sorry for that. 
&lt;BR /&gt;To make this work. 
&lt;BR /&gt;In the talend Repository Menu, create a new Routines : 
&lt;BR /&gt; 
&lt;PRE&gt;// template routine Java&lt;BR /&gt;package routines;&lt;BR /&gt;import java.io.*;&lt;BR /&gt;/*&lt;BR /&gt; * user specification: the function's comment should contain keys as follows: 1. write about the function's comment.but&lt;BR /&gt; * it must be before the "{talendTypes}" key.&lt;BR /&gt; * &lt;BR /&gt; * 2. {talendTypes} 's value must be talend Type, it is required . its value should be one of: String, char | Character,&lt;BR /&gt; * long | Long, int | Integer, boolean | Boolean, byte | Byte, Date, double | Double, float | Float, Object, short |&lt;BR /&gt; * Short&lt;BR /&gt; * &lt;BR /&gt; * 3. {Category} define a category for the Function. it is required. its value is user-defined .&lt;BR /&gt; * &lt;BR /&gt; * 4. {param} 's format is: {param} &amp;lt;type&amp;gt; &amp;lt;name&amp;gt;&lt;BR /&gt; * &lt;BR /&gt; * &amp;lt;type&amp;gt; 's value should be one of: string, int, list, double, object, boolean, long, char, date. &amp;lt;name&amp;gt;'s value is the&lt;BR /&gt; * Function's parameter name. the {param} is optional. so if you the Function without the parameters. the {param} don't&lt;BR /&gt; * added. you can have many parameters for the Function.&lt;BR /&gt; * &lt;BR /&gt; * 5. {example} gives a example for the Function. it is optional.&lt;BR /&gt; */&lt;BR /&gt;public class fichierRef {&lt;BR /&gt;    /**&lt;BR /&gt;     * readFile: lit le fichier pdf et renvoi une chaine&lt;BR /&gt;     * &lt;BR /&gt;     * &lt;BR /&gt;     * {talendTypes} String&lt;BR /&gt;     * &lt;BR /&gt;     * {Category} User Defined&lt;BR /&gt;     * &lt;BR /&gt;     * {param} string() input: le nom du fichier à lire&lt;BR /&gt;     * &lt;BR /&gt;     * {example} readFile("/etc/passwd") # hacking en cours ...&lt;BR /&gt;     */&lt;BR /&gt;    public static String readFile(String fichier) {&lt;BR /&gt;    	String chaine = new String() ;&lt;BR /&gt;    	try {&lt;BR /&gt;    		InputStream ips=new FileInputStream(fichier);&lt;BR /&gt;	    	InputStreamReader ipsr=new InputStreamReader(ips);&lt;BR /&gt;	    	BufferedReader br=new BufferedReader(ipsr);&lt;BR /&gt;	    	String ligne;&lt;BR /&gt;	    	while ((ligne=br.readLine())!=null){&lt;BR /&gt;	    		chaine+=ligne+"\n";&lt;BR /&gt;	    	}&lt;BR /&gt;	    	br.close(); &lt;BR /&gt;	    	return chaine ;&lt;BR /&gt;    	}catch(Exception e){&lt;BR /&gt;    		return "";&lt;BR /&gt;    	}&lt;BR /&gt;    	&lt;BR /&gt;    }&lt;/PRE&gt; 
&lt;BR /&gt;On any tMap you need it, put this sort of data : 
&lt;BR /&gt;routines.fichierRef.readFile(row3.filename).getBytes()</description>
      <pubDate>Fri, 15 May 2009 12:39:49 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305851#M77467</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-15T12:39:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305852#M77468</link>
      <description>Notice that you could use some PDF library (iText) to extract some metadata.</description>
      <pubDate>Fri, 15 May 2009 12:49:26 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305852#M77468</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-15T12:49:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305853#M77469</link>
      <description>hi,
&lt;BR /&gt;Urgent please
&lt;BR /&gt;i am new to talend
&lt;BR /&gt;I need help to read a pdf and write the contents to txt file can some one help me to get started.
&lt;BR /&gt; 
&lt;BR /&gt;I also tried adding the tFileOutputPDF after adding this in the talend tool in options window---&amp;gt;preferences---&amp;gt;talend---&amp;gt;components---&amp;gt;user component folder but not able to view in the palette.
&lt;BR /&gt;Please help me giving some suggestions
&lt;BR /&gt;
&lt;BR /&gt;Thank's
&lt;BR /&gt;jones</description>
      <pubDate>Tue, 05 Mar 2013 12:22:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305853#M77469</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-03-05T12:22:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305854#M77470</link>
      <description>HI Cabajones 
&lt;BR /&gt;tFileOutputPDF is a component, you can download from talend exchange.(http://www.talendforge.org/exchange/) 
&lt;BR /&gt; 
&lt;BR /&gt;thanks 
&lt;BR /&gt;B. Anil Kumar</description>
      <pubDate>Tue, 05 Mar 2013 12:48:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305854#M77470</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-03-05T12:48:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305855#M77471</link>
      <description>&lt;BLOCKQUOTE&gt; 
 &lt;TABLE border="1"&gt; 
  &lt;TBODY&gt; 
   &lt;TR&gt; 
    &lt;TD&gt;hi,&lt;BR /&gt;&lt;BR /&gt;I also tried adding the tFileOutputPDF after adding this in the talend tool in options window---&amp;gt;preferences---&amp;gt;talend---&amp;gt;components---&amp;gt;user component folder but not able to view in the palette.&lt;BR /&gt;Please help me giving some suggestions&lt;BR /&gt;&lt;BR /&gt;Thank's&lt;BR /&gt;jones&lt;/TD&gt; 
   &lt;/TR&gt; 
  &lt;/TBODY&gt; 
 &lt;/TABLE&gt; 
&lt;/BLOCKQUOTE&gt; 
&lt;BR /&gt;Hi Jones 
&lt;BR /&gt;tFileOutputPDF is used to write data to a PDF file, there is no a component can be used to read data from a PDF file, you need to hard code to read it in a routine as arfman did and call it in a job. 
&lt;BR /&gt;Shong</description>
      <pubDate>Wed, 06 Mar 2013 03:56:14 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305855#M77471</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-03-06T03:56:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305856#M77472</link>
      <description>Hi,
&lt;BR /&gt;Thank's for very useful information
&lt;BR /&gt;i have written a method to read the pdf 
&lt;BR /&gt;Can you please help me how to add the method as a Routines to run the code from the talend tool
&lt;BR /&gt;when i create a job i am able to view the code but not able to edit it to add my method.
&lt;BR /&gt;Please give me a suggestion.
&lt;BR /&gt;
&lt;BR /&gt;Thank's
&lt;BR /&gt;caba</description>
      <pubDate>Wed, 06 Mar 2013 13:30:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305856#M77472</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-03-06T13:30:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305857#M77473</link>
      <description>Check out the documentation &lt;A href="https://help.talend.com/search/all?query=Managing+user+routines&amp;amp;content-lang=en" rel="nofollow noopener noreferrer"&gt;https://help.talend.com/search/all?query=Managing+user+routines&amp;amp;content-lang=en&lt;/A&gt;&lt;BR /&gt;and let us know if you need further assistance.</description>
      <pubDate>Wed, 06 Mar 2013 13:36:57 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305857#M77473</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-03-06T13:36:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305858#M77474</link>
      <description>hello Cabajones&lt;BR /&gt;would you be so kind to share your routine?&lt;BR /&gt;i am sure it would help other too.&lt;BR /&gt;thanks,</description>
      <pubDate>Wed, 06 Mar 2013 18:20:02 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305858#M77474</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-03-06T18:20:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305859#M77475</link>
      <description>Is there any change in the status of this - "no compoent exists to read pdfs"
&lt;BR /&gt;Given the nature of PDFs, that's what I'd expect, just checking.</description>
      <pubDate>Tue, 18 Feb 2014 15:50:36 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305859#M77475</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-02-18T15:50:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305860#M77476</link>
      <description>Why should a ETL tool read a PDF file?</description>
      <pubDate>Tue, 18 Feb 2014 20:44:57 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305860#M77476</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-02-18T20:44:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305861#M77477</link>
      <description>I agree it doesn't make good sense but my boss told me to ask. Your answer is reassuring 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MACn.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154443iC5B8CACEF3D12C6A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MACn.png" alt="0683p000009MACn.png" /&gt;&lt;/span&gt;.</description>
      <pubDate>Wed, 19 Feb 2014 15:28:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305861#M77477</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-02-19T15:28:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305862#M77478</link>
      <description>Good question. In the moment you have to use self written code in a tJavaFlex but I do not know how to read a PDF. 
&lt;BR /&gt;I would google for it. Sorry. 
&lt;BR /&gt;Ony problem is: a PDF can be created from images and the structure of the text is oriented for the layout and does not have a fix structure like a HTML table. A solution would be meanly a individual solution for a particular PDF file and every layout changes on the file will have impact to your code.</description>
      <pubDate>Wed, 19 Feb 2014 16:20:49 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305862#M77478</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-02-19T16:20:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305863#M77479</link>
      <description>Is there any change in status of no component exist to read pdf ?
&lt;BR /&gt;Okay, even if no component exists, is there any way to extract some particular columnar data (although no physical table structure is drawn in pdf, but virtually data is divided into columns) and store it in DB table columns ?
&lt;BR /&gt;Through java code and itext library in routine, I am able to read pdf file but as mentioned above how to extract columns from pdf ?
&lt;BR /&gt;Any code or url reference for this will be helpful.&amp;nbsp;</description>
      <pubDate>Fri, 05 Feb 2016 11:14:01 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305863#M77479</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-02-05T11:14:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305864#M77480</link>
      <description>Google "Java API for reading PDF files".
&lt;BR /&gt;This is an unusual requirement (for reasons already explained above), but if there is text in the PDF that can be retrieved, the best way is to write a Java routine making use of an existing Java API. One of Talend's massive advantages over other tools is the ease at which you can write your own components or just add code to a tJavaFlex to make use of third party APIs.</description>
      <pubDate>Fri, 05 Feb 2016 11:46:37 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305864#M77480</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-02-05T11:46:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to read pdf file in talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305865#M77481</link>
      <description>Hi talend team, 
&lt;BR /&gt;We have a requirement to read the data from a PDF file/files.&amp;nbsp;wanted to know like&amp;nbsp;do we have any component provided by talend tool&amp;nbsp;through which we can read the content from the pdf files. 
&lt;BR /&gt;I have gone through the different posts on google but maximum I found that it can be done using a&amp;nbsp;piece of java code, but issue is that&amp;nbsp;it is&amp;nbsp;customized for a particular&amp;nbsp;file and not valid unanimously for any kind of PDF file. So request you to share something on this so that I can get clear picture and decide accordingly to go ahead with talend as ETL tool for my assignment. Any sort of help would be appreciable 
&lt;BR /&gt;Thanks&amp;nbsp;</description>
      <pubDate>Thu, 05 Jan 2017 13:36:58 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-pdf-file-in-talend/m-p/2305865#M77481</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-01-05T13:36:58Z</dc:date>
    </item>
  </channel>
</rss>

