
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Read log file and get specific information
Hi,
I have a text file in below format (unstructured, but has one pipe separated table in middle of file).
=============Text FIle======================
Starting the session...
Session started.
Active session: [1] sfdsdf$sdfsdf@ftp-sdfsdf.dsfsf.com
/Shared/ETL_Testing/Incoming
C:\ddd\ssss\DEV\DATA\IN
<Downloading_files>
File_1.csv | 650 B | 0.8 KB/s | binary | 100%
File_2.xlsx | 887 B | 0.7 KB/s | binary | 100%
File_3.csv | 888 B | 0.7 KB/s | binary | 100%
File_4.csv | 44 B | 0.5 KB/s | binary | 100%
<Downloading_files>
Session 'sdsd$sdsd@ftp-sdsd.sdsd.com' closed.
No session.
================END============================
From above file i just want to read below table and store it somewhere, for now we can display in tLogRow
File_1.csv | 650 B | 0.8 KB/s | binary | 100%
File_2.xlsx | 887 B | 0.7 KB/s | binary | 100%
File_3.csv | 888 B | 0.7 KB/s | binary | 100%
File_4.csv | 44 B | 0.5 KB/s | binary | 100%
Thanks,
Sachin
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In which case, you'd still use the tFileInputDelimited component, specify a Field Separator of " | " (space-pipe-space), add the five fields to the schema, tick the "Check each row structure against schema" option in Advanced settings, and your "Main" output will just be the lines you want.
To avoid seeing warnings in the output for lines which don't match, just direct the "Reject" output from the tFileInputDelimeted component somewhere.
If you specifically want to parse just what's between the tags you've mentioned, then there's a bit more work involved, but this simple approach should be sufficient in most cases.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So you're arrived. Just have to replace tLogRow by a tFileOutputDelimited component with "|" as a field separator.
No?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think @SachinD is asking about how to read in a file with the contents that they've provided, and extract the section they're interested in as individual rows in a flow.
If that's the case, and if the file always has the same number of lines before and after the actual data lines, then the answer would be to use a tFileInputDelimited, specifying (for the example provided) a header of 6 lines, a footer of 3, and a field separator of " | " (space-pipe-space) or maybe just "|" and trim all columns depending on whether the expected data should have leading/trailing spaces.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @TRF and @ciw1973 for taking time and looking into my issue.
Let me explain my current scenario once again.
Mine unstructured log file looks like below, which can have N no of rows in top (HDR) and Bottom (footer), it can vary in each run.
and I am interested in only pipe delimited Table which come in between below Tag's , and neglect other rows.
<Downloading_files> <Downloading_files>
I have a control on this tags, and I have intentionally Printed this tags in log file so that we can identify from where this Table starts and Ends. and i need this table to be stored some where.
FILE
=============Text FIle======================
Starting the session...
Session started.
Active session: [1] sfdsdf$sdfsdf@ftp-sdfsdf.dsfsf.com
/Shared/ETL_Testing/Incoming
C:\ddd\ssss\DEV\DATA\IN
<Downloading_files>
File_1.csv | 650 B | 0.8 KB/s | binary | 100%
File_2.xlsx | 887 B | 0.7 KB/s | binary | 100%
File_3.csv | 888 B | 0.7 KB/s | binary | 100%
File_4.csv | 44 B | 0.5 KB/s | binary | 100%
<Downloading_files>
Session 'sdsd$sdsd@ftp-sdsd.sdsd.com' closed.
No session.
================END============================
I apologize, if i have confused you..
Thanks,
Sachin D.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In which case, you'd still use the tFileInputDelimited component, specify a Field Separator of " | " (space-pipe-space), add the five fields to the schema, tick the "Check each row structure against schema" option in Advanced settings, and your "Main" output will just be the lines you want.
To avoid seeing warnings in the output for lines which don't match, just direct the "Reject" output from the tFileInputDelimeted component somewhere.
If you specifically want to parse just what's between the tags you've mentioned, then there's a bit more work involved, but this simple approach should be sufficient in most cases.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
