Hi
How can I process a bunch of semi-structured TXT file to parse its content and insert on a Mysql database?
I need a hand on this. I have really no clue of how to start processing this kind of file. Any help, link or tutorial will be much appreciated.
Thanks in advance
My file is something like this:
There's probably a 100 way to do this...
Your data looks uniform, from the two sample 'records' you provide.
You could try: -
Read file using tFileInputDelimited as one field per line. Ignore blank lines and trim strings.
Use tMemorizeRows to memorize the last 15 (I think that's the correct number) rows.
Set-up a filter to look for the (final) "Timestamp" record.
Only pass this row forward in your flow, to tMap.
Map your record in tMap.
You can use the memorized rows to refer back to the other 'fields' starting from the Date/Time string through to "Acct-Session-Time". For most of your data, you can split on "=" and trim, to get the data value.
You should then have a row per file 'record'
You'd probably want to add some Exception handling.
Everything is assumed a String; but you could change the datatypes in the tMap or use tConvertType.
Output is: -
Hi Jose, I am wondering about what would be your output metadata i.e. column structure? Can you pl shed some info on it. What would be output column format ? Thanks Vaibhav
Right now my problem is to process that kind of file taking into account sometimes the fields are not always the same. Sometimes I have 15 fields and sometimes 18. Table schema is not a problem because I have every needed field. Sometimes will be in blank because some fields will not be present.
My first concern is to detect the end of every set of data because not always is the "Timestamp" data.
Whether the empty line is the end of data???. Whether the first column of the data block is defined or standard? You need to have some business or derivable logic to identify the end of data block or start of the block data or any sort of delimiter which distinguishes between two data blocks... Whether the data in the block is ordered list format? You would get some idea for defining or detecting the data block. Thanks Vaibhav
I don't think this is a Talend question. If your file is consistent as your examples showed, then it is a simple process to extract your data. If it is inconsistent, you need to describe the possible scenarios so there's at least a fighting chance of understanding how the data may be extracted.
Then, maybe, you need to not ignore blank lines in your input file. memorize the maximum number of 'fields'. Look for the blank line rather than "Timestamp" and then scan back through the memorized rows to see what you've got.
Hi Jose, Can you try a trick... Read input file with - new line as field delimiter - Large " " white space as line line delimiter - and then use above techniques... Vaibhav