<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to read data from a word file in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311982#M82974</link>
    <description>Hi Gabriel,&lt;BR /&gt;First of all thank you for your reply.&lt;BR /&gt;I have a requirement where i have to read data from a Microsoft word file.&lt;BR /&gt;I am well aware that a word file is unstructured but i just want to match pattern in file and read data across it.&lt;BR /&gt;For Example :&lt;BR /&gt;Name : kathi&lt;BR /&gt;Place : USA&lt;BR /&gt;with a sepcified deilimeter . &lt;BR /&gt;I wanted to match this "name" and read data "kathi" in TOS.&lt;BR /&gt;Regards,&lt;BR /&gt;Sandeep.</description>
    <pubDate>Tue, 17 Apr 2012 12:32:41 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2012-04-17T12:32:41Z</dc:date>
    <item>
      <title>How to read data from a word file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311980#M82972</link>
      <description>Hi Talend Team,&lt;BR /&gt;I just wanted to read some data from a word file.&lt;BR /&gt;Is there any direct component which can read a word file .&lt;BR /&gt;Or is there any way to it ???&lt;BR /&gt;Regards,&lt;BR /&gt;Sandeep.</description>
      <pubDate>Tue, 17 Apr 2012 11:55:01 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311980#M82972</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-04-17T11:55:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to read data from a word file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311981#M82973</link>
      <description>Hi, 
&lt;BR /&gt;there is a discussion on LinkedIn about this topic (or it was you who wrote the question? ( 
&lt;A href="http://www.linkedin.com/groupItem?view=&amp;amp;gid=812977&amp;amp;type=member&amp;amp;item=107111395&amp;amp;qid=06608beb-085b-4573-92de-e691676590d0&amp;amp;trk=group_most_popular-0-b-cmr&amp;amp;goback=.gmp_812977" rel="nofollow noopener noreferrer"&gt;http://www.linkedin.com/groupItem?view=&amp;amp;gid=812977&amp;amp;type=member&amp;amp;item=107111395&amp;amp;qid=06608beb-085b-4573-92de-e691676590d0&amp;amp;trk=group_most_popular-0-b-cmr&amp;amp;goback=.gmp_812977&lt;/A&gt;) 
&lt;BR /&gt;Still I say - the problem with a word document is, that it is unstructured. I mean - it can contain tables, text, images, links, headers, other documents.. You could read data from an Excel sheet, but at least there are tables. So it doesn't go directly from a Word doc, but you need a a step to extract any structured information. In theory - you may create a script to save your word document as a clear text, but don't you loose any information? 
&lt;BR /&gt;If you know what is in the word document - e.g. CSV (comma separated values), you can use POI API or Visual Baisc to extract data from Word - usualy as delimited values (CSV) - and then Talend to do something useful with data. 
&lt;BR /&gt;Carpe diem 
&lt;BR /&gt;Gabriel</description>
      <pubDate>Tue, 17 Apr 2012 12:14:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311981#M82973</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-04-17T12:14:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to read data from a word file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311982#M82974</link>
      <description>Hi Gabriel,&lt;BR /&gt;First of all thank you for your reply.&lt;BR /&gt;I have a requirement where i have to read data from a Microsoft word file.&lt;BR /&gt;I am well aware that a word file is unstructured but i just want to match pattern in file and read data across it.&lt;BR /&gt;For Example :&lt;BR /&gt;Name : kathi&lt;BR /&gt;Place : USA&lt;BR /&gt;with a sepcified deilimeter . &lt;BR /&gt;I wanted to match this "name" and read data "kathi" in TOS.&lt;BR /&gt;Regards,&lt;BR /&gt;Sandeep.</description>
      <pubDate>Tue, 17 Apr 2012 12:32:41 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311982#M82974</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-04-17T12:32:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to read data from a word file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311983#M82975</link>
      <description>Hi Sandeep,
&lt;BR /&gt;then I'd create a script using a POI API (or any Word manipulation API, e.g. Lucene ) to extract document's body clear text (I usually deploy all my routines as web services, it is easier and more accessible than trying to make a new Talend Component)- and then 
&lt;BR /&gt;- for every document (tFileList)
&lt;BR /&gt;- extract content as clear text (tSSH, tWebService) into a temporary file 
&lt;BR /&gt; - read per row (tFileInputFullRow)
&lt;BR /&gt; - check if file contains searched string (tFilterRow)
&lt;BR /&gt; - read other rows necessary (tFileInputRegex)
&lt;BR /&gt;but there is no out-of-the-box Talend component to extract clear text from a word document. In theory, you could reuse a WordExtractor from Lucene project (it uses POI as well).
&lt;BR /&gt;Gabriel</description>
      <pubDate>Tue, 17 Apr 2012 13:42:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311983#M82975</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-04-17T13:42:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to read data from a word file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311984#M82976</link>
      <description>Hi Gabriel,
&lt;BR /&gt;Thank you once again for your reply.
&lt;BR /&gt;So, we can extract text using script of POI API.Can please mail or post procdure to create a sample job which would be of a great help to me.
&lt;BR /&gt;
&lt;BR /&gt;Regards,
&lt;BR /&gt;Sandeep.</description>
      <pubDate>Wed, 18 Apr 2012 04:10:36 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-read-data-from-a-word-file/m-p/2311984#M82976</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-04-18T04:10:36Z</dc:date>
    </item>
  </channel>
</rss>

