<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Parsing pretty text files in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288764#M62224</link>
    <description>Hi Zachary, 
&lt;BR /&gt;Here's my 2 cents worth. 
&lt;BR /&gt;I am assuming you want each report to be a single output record (basically a many input -&amp;gt; one output scenario). 
&lt;BR /&gt;I would define the input at space (' ') delimited and output to a delimited file (';'). You will end up with a file with up to 7 (i think I counted correctly) columns. 
&lt;BR /&gt;Since no two lines are the same, I would then use tJavaRow to build the output record. If you do a search on talend forum you can find examples of this. 
&lt;BR /&gt;You will need check field 1 on some lines ( ie 'for MACHINE) and field 4 on others (i.e TRIM). You may also need to concatenate some fields back together to get output you need. 
&lt;BR /&gt;Since there are several reports in one input file, you will need to generate a simple sequence number for each report, and also each line of each report. 
&lt;BR /&gt;Then you can sort by report/line number (descending) and then use TUniqRow on report number (checking the 'Only once each duplicated key' option under the Advanced Tab). 
&lt;BR /&gt;It's not pretty but neither is the input. 
&lt;BR /&gt;Give it a go. If you have problems maybe you can copy/paste a file sample instead of image. I might have time to see if I can get it to work. 
&lt;BR /&gt;Bye for now,</description>
    <pubDate>Wed, 06 May 2009 22:54:53 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2009-05-06T22:54:53Z</dc:date>
    <item>
      <title>Parsing pretty text files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288760#M62220</link>
      <description>I am working on a project to parse a text file, the problems I face are the following: 1. The files format and content change slightly at the whim of the person generating the report (reason I am shooting down writing custom code to do the parsing), 2. The TXT file is made to be human readable, in other words pretty; all lined up so the delimiter (spaces) varies depending on the length of the data. 3. The file contains three main parts the first two are in the following format:"Name Data Name Data Name Data" and the third in this format: Name Name Name 
&lt;BR /&gt; Data Data Data 
&lt;BR /&gt;I started looking into this software because unlike myself the main users are not code monkeys so I figured with the graphical interface making a small change to the parsing would be pretty simple. What I am looking for from this post is a direction and maybe some ideas, what components would be the best fit for parsing the two formats of data that would be easily changeable by non-programmers. Usually this would be no problem but since it changes on a whim and non code monkeys have to keep up with the changes this has become a bit more difficult; look forward to hearing some input. 
&lt;BR /&gt;Thank You, 
&lt;BR /&gt;Zachary Long</description>
      <pubDate>Sat, 16 Nov 2024 13:58:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288760#M62220</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T13:58:21Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing pretty text files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288761#M62221</link>
      <description>Hello friend
&lt;BR /&gt;Can you show us an example of content of file?
&lt;BR /&gt;Best regards
&lt;BR /&gt; 
&lt;BR /&gt; shong</description>
      <pubDate>Mon, 04 May 2009 15:25:27 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288761#M62221</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-04T15:25:27Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing pretty text files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288762#M62222</link>
      <description>Here is a piece of one of the many files I need to parse this shows the three sections, also there are many of these per file, separated by "END OF REPORT", which I figure will not be to hard to implement to separate reports. Something that I did forget to mention is that there are several reports per input file, ultimate goal will be to combine all data using the data as a delimiter. 
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;Zachary Long</description>
      <pubDate>Mon, 04 May 2009 15:37:52 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288762#M62222</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-04T15:37:52Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing pretty text files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288763#M62223</link>
      <description>Hello all, I am guessing by the lack of response that you everyone is just as stumped as I am ?&lt;BR /&gt;Thanks&lt;BR /&gt;Zachary Long</description>
      <pubDate>Wed, 06 May 2009 20:56:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288763#M62223</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-06T20:56:50Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing pretty text files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288764#M62224</link>
      <description>Hi Zachary, 
&lt;BR /&gt;Here's my 2 cents worth. 
&lt;BR /&gt;I am assuming you want each report to be a single output record (basically a many input -&amp;gt; one output scenario). 
&lt;BR /&gt;I would define the input at space (' ') delimited and output to a delimited file (';'). You will end up with a file with up to 7 (i think I counted correctly) columns. 
&lt;BR /&gt;Since no two lines are the same, I would then use tJavaRow to build the output record. If you do a search on talend forum you can find examples of this. 
&lt;BR /&gt;You will need check field 1 on some lines ( ie 'for MACHINE) and field 4 on others (i.e TRIM). You may also need to concatenate some fields back together to get output you need. 
&lt;BR /&gt;Since there are several reports in one input file, you will need to generate a simple sequence number for each report, and also each line of each report. 
&lt;BR /&gt;Then you can sort by report/line number (descending) and then use TUniqRow on report number (checking the 'Only once each duplicated key' option under the Advanced Tab). 
&lt;BR /&gt;It's not pretty but neither is the input. 
&lt;BR /&gt;Give it a go. If you have problems maybe you can copy/paste a file sample instead of image. I might have time to see if I can get it to work. 
&lt;BR /&gt;Bye for now,</description>
      <pubDate>Wed, 06 May 2009 22:54:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288764#M62224</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-06T22:54:53Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing pretty text files</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288765#M62225</link>
      <description>regex based Perl parsing would be a great fit for this problem. you can use clever regex's to locate your position in the file, and then parse out the data you need.
&lt;BR /&gt;If you're stuck with Java check out this package:
&lt;BR /&gt;
&lt;A href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/package-summary.html" rel="nofollow noopener noreferrer"&gt;http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/package-summary.html&lt;/A&gt;
&lt;BR /&gt;
&lt;A href="http://java.sun.com/developer/technicalArticles/releases/1.4regex/" rel="nofollow noopener noreferrer"&gt;http://java.sun.com/developer/technicalArticles/releases/1.4regex/&lt;/A&gt;</description>
      <pubDate>Thu, 07 May 2009 00:01:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Parsing-pretty-text-files/m-p/2288765#M62225</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-07T00:01:38Z</dc:date>
    </item>
  </channel>
</rss>

