<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: processing a semi-structured text file in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358656#M123644</link>
    <description>Hi Jose,&lt;BR /&gt;Can you try a trick...&lt;BR /&gt;Read input file with&lt;BR /&gt;- new line as field delimiter&lt;BR /&gt;- Large "                                                                       " white space as line line delimiter&lt;BR /&gt;- and then use above techniques...&lt;BR /&gt;Vaibhav</description>
    <pubDate>Tue, 29 Apr 2014 08:41:43 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2014-04-29T08:41:43Z</dc:date>
    <item>
      <title>processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358647#M123635</link>
      <description>Hi 
&lt;BR /&gt;How can I process a bunch of semi-structured TXT file to parse its content and insert on a Mysql database? 
&lt;BR /&gt;I need a hand on this. I have really no clue of how to start processing this kind of file. Any help, link or tutorial will be much appreciated. 
&lt;BR /&gt;Thanks in advance 
&lt;BR /&gt;My file is something like this: 
&lt;BR /&gt; 
&lt;BR /&gt;Mon Apr 21 00:00:13 2014 
&lt;BR /&gt; Acct-Status-Type = Interim-Update 
&lt;BR /&gt; NAS-Port-Type = Wireless-802.11 
&lt;BR /&gt; User-Name = "user@name.com" 
&lt;BR /&gt; NAS-Port = 2149596816 
&lt;BR /&gt; Acct-Session-Id = "80203e90" 
&lt;BR /&gt; Event-Timestamp = "Apr 21 2014 00:00:13 UTC" 
&lt;BR /&gt; Acct-Input-Octets = 2745995 
&lt;BR /&gt; Acct-Output-Octets = 17889908 
&lt;BR /&gt; Acct-Input-Gigawords = 0 
&lt;BR /&gt; Acct-Output-Gigawords = 0 
&lt;BR /&gt; Acct-Input-Packets = 19376 
&lt;BR /&gt; Acct-Output-Packets = 20912 
&lt;BR /&gt; Acct-Session-Time = 7022 
&lt;BR /&gt; Timestamp = 1398038413 
&lt;BR /&gt;Mon Apr 21 00:00:14 2014 
&lt;BR /&gt; Acct-Status-Type = stop 
&lt;BR /&gt; NAS-Port-Type = Wireless-802.11 
&lt;BR /&gt; User-Name = "user@name.com" 
&lt;BR /&gt; NAS-Port = 2149596816 
&lt;BR /&gt; Acct-Session-Id = "80267e90" 
&lt;BR /&gt; Event-Timestamp = "Apr 21 2014 00:00:13 UTC" 
&lt;BR /&gt; Acct-Input-Octets = 2746795 
&lt;BR /&gt; Acct-Output-Octets = 17885408 
&lt;BR /&gt; Acct-Input-Gigawords = 0 
&lt;BR /&gt; Acct-Output-Gigawords = 0 
&lt;BR /&gt; Acct-Input-Packets = 19345 
&lt;BR /&gt; Acct-Output-Packets = 23342 
&lt;BR /&gt; Acct-Session-Time = 70 
&lt;BR /&gt; Timestamp = 1345668413</description>
      <pubDate>Sat, 16 Nov 2024 11:40:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358647#M123635</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2024-11-16T11:40:23Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358648#M123636</link>
      <description>There's probably a 100 way to do this... 
&lt;BR /&gt;Your data looks uniform, from the two sample 'records' you provide. 
&lt;BR /&gt;You could try: - 
&lt;BR /&gt;Read file using tFileInputDelimited as one field per line. Ignore blank lines and trim strings. 
&lt;BR /&gt;Use tMemorizeRows to memorize the last 15 (I think that's the correct number) rows. 
&lt;BR /&gt;Set-up a filter to look for the (final) "Timestamp" record. 
&lt;BR /&gt;Only pass this row forward in your flow, to tMap. 
&lt;BR /&gt;Map your record in tMap. 
&lt;BR /&gt;You can use the memorized rows to refer back to the other 'fields' starting from the Date/Time string through to "Acct-Session-Time". For most of your data, you can split on "=" and trim, to get the data value. 
&lt;BR /&gt;You should then have a row per file 'record'</description>
      <pubDate>Thu, 24 Apr 2014 19:01:25 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358648#M123636</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-04-24T19:01:25Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358649#M123637</link>
      <description>I had a quick go at this. 
&lt;BR /&gt; 
&lt;BR /&gt;tFileInputDelimited-&amp;gt;tMemorizeRows-&amp;gt;tFilter-&amp;gt;tMap-&amp;gt;tLogRow 
&lt;BR /&gt;I named the input record "theRecord". 
&lt;BR /&gt; 
&lt;BR /&gt;Set tFileInputDelimited to skip blanks and trim strings 
&lt;BR /&gt;Set tMemorizrRows=15 and check the input column for memorisation. 
&lt;BR /&gt; 
&lt;BR /&gt;Set tFilter advanced to "Timestamp".equals(row2.theRecord.substring(0, 9)) 
&lt;BR /&gt; 
&lt;BR /&gt;I mapped two columns for testing 
&lt;BR /&gt; 
&lt;BR /&gt;AcctStatusType=((String[]) globalMap.get("tMemorizeRows_1_theRecord")).substring(((String[]) globalMap.get("tMemorizeRows_1_theRecord")).indexOf('=') + 2) 
&lt;BR /&gt;Timetamp=row3.theRecord.substring(row3.theRecord.indexOf('=') + 2) 
&lt;BR /&gt; 
&lt;BR /&gt;You'd probably want to add some Exception handling. 
&lt;BR /&gt;Everything is assumed a String; but you could change the datatypes in the tMap or use tConvertType. 
&lt;BR /&gt;Output is: - 
&lt;BR /&gt; 
&lt;BR /&gt;.--------------+----------. 
&lt;BR /&gt;| tLogRow_1 | 
&lt;BR /&gt;|=-------------+---------0| 
&lt;BR /&gt;|AcctStatusType|Timestamp | 
&lt;BR /&gt;|=-------------+----------| 
&lt;BR /&gt;|Interim-Update|1398038413| 
&lt;BR /&gt;|stop |1345668413| 
&lt;BR /&gt;'--------------+----------' 
&lt;BR /&gt; 
&lt;IMG src="https://community.qlik.com/" /&gt;</description>
      <pubDate>Fri, 25 Apr 2014 07:48:35 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358649#M123637</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-04-25T07:48:35Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358650#M123638</link>
      <description>Hi Jose,&lt;BR /&gt;I am wondering about what would be your output metadata i.e. column structure?&lt;BR /&gt;Can you pl shed some info on it. What would be output column format ?&lt;BR /&gt;Thanks&lt;BR /&gt;Vaibhav</description>
      <pubDate>Fri, 25 Apr 2014 07:55:29 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358650#M123638</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-04-25T07:55:29Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358651#M123639</link>
      <description>thanks tal00000 and sanvaibhav&lt;BR /&gt;&lt;BR /&gt;Right now my problem is to process that kind of file taking into account sometimes the fields are not always the same. Sometimes I have 15 fields and sometimes 18.&lt;BR /&gt;Table schema is not a problem because I have every needed field. Sometimes will be in blank because some fields will not be present.&lt;BR /&gt;&lt;BR /&gt;My first concern is to detect the end of every set of data because not always is the "Timestamp" data.&lt;BR /&gt;&lt;BR /&gt;Regards</description>
      <pubDate>Mon, 28 Apr 2014 07:51:11 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358651#M123639</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2014-04-28T07:51:11Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358652#M123640</link>
      <description>Whether the empty line is the end of data???. Whether the first column of the data block is defined or standard? You need to have some business or derivable logic to identify the end of data block or start of the block data or any sort of delimiter which distinguishes between two data blocks...&lt;BR /&gt;Whether the data in the block is ordered list format? &lt;BR /&gt;You would get some idea for defining or detecting the data block.&lt;BR /&gt;Thanks&lt;BR /&gt;Vaibhav</description>
      <pubDate>Mon, 28 Apr 2014 08:04:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358652#M123640</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-04-28T08:04:59Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358653#M123641</link>
      <description>I don't think this is a Talend question.&lt;BR /&gt;If your file is consistent as your examples showed, then it is a simple process to extract your data.&lt;BR /&gt;If it is inconsistent, you need to describe the possible scenarios so there's at least a fighting chance of understanding how the data may be extracted.</description>
      <pubDate>Mon, 28 Apr 2014 09:01:09 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358653#M123641</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-04-28T09:01:09Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358654#M123642</link>
      <description>Hi&lt;BR /&gt;&lt;BR /&gt;The only data we can take for sure is that every chunk is delimited by an empty line.&lt;BR /&gt;&lt;BR /&gt;thanks</description>
      <pubDate>Tue, 29 Apr 2014 07:48:16 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358654#M123642</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2014-04-29T07:48:16Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358655#M123643</link>
      <description>Then, maybe, you need to not ignore blank lines in your input file. memorize the maximum number of 'fields'. Look for the blank line rather than "Timestamp" and then scan back through the memorized rows to see what you've got.</description>
      <pubDate>Tue, 29 Apr 2014 08:24:57 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358655#M123643</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-04-29T08:24:57Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358656#M123644</link>
      <description>Hi Jose,&lt;BR /&gt;Can you try a trick...&lt;BR /&gt;Read input file with&lt;BR /&gt;- new line as field delimiter&lt;BR /&gt;- Large "                                                                       " white space as line line delimiter&lt;BR /&gt;- and then use above techniques...&lt;BR /&gt;Vaibhav</description>
      <pubDate>Tue, 29 Apr 2014 08:41:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358656#M123644</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-04-29T08:41:43Z</dc:date>
    </item>
    <item>
      <title>Re: processing a semi-structured text file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358657#M123645</link>
      <description>I think you can used tDenormalize or normalise component to parse. check below link for more details. 
&lt;BR /&gt;
&lt;A href="https://help.talend.com/search/all?query=tDenormalize&amp;amp;content-lang=en" rel="nofollow noopener noreferrer"&gt;https://help.talend.com/search/all?query=tDenormalize&amp;amp;content-lang=en&lt;/A&gt;</description>
      <pubDate>Tue, 29 Apr 2014 09:25:25 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/processing-a-semi-structured-text-file/m-p/2358657#M123645</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-04-29T09:25:25Z</dc:date>
    </item>
  </channel>
</rss>

