
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Extract specific data from text file
I have a text file as follows:
----------------------------------------
ID: 00070
Date: 2022-06-17T09:34:50
Item is now available
Export : 0bf08b33 (2022-06-17T09:35:07) is here -> File D:\PATH\TO\FILE\LOCATION\
----------------------------------------
I'm attempting to export just the ID and the
D:\PATH\TO\FILE\LOCATION\ elements into a table/columns.
Using this workflow, I can extract the relevent rows (beginning 'ID....' & ' Export....')
How would I extract the specific data required from these rows?
Thanks in advance
Conkers
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the best would be to use the following component: https://help.talend.com/r/en-US/8.0/ms-delimited/ms-delimited-scenario

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An easy first change would be to use tFileInputDelimited component. Set the "Field separator" to be ": " (remember the trailing space) and set up two columns. The first one being "RowType" which will hold your row identifier (ID, Date, Item is now available and Export). The second will be your data and will be called "Value". You may want 3rd and a 4th to hold minutes and seconds from the date row, but I guess that these are not necessary.
Once you have this, you can link to a tMap and have a filter on your output table which holds something like this.....
row1.RowType.equals("ID") || row1.RowType.equals("Export ")
This on its own will output this....
ID|00070
Export |0bf08b33 (2022-06-17T09:35:07) is here -> File D:\PATH\TO\FILE\LOCATION\
The pipes above ("|") simply separate the columns. So you have two rows with two columns. Your first row holds your ID value already sorted, the second row has your Export value which will need some further processing.
To do that, further edit the tMap's output table's "Value" column. Replace the row1.Value expression with this....
row1.Value!=null && row1.Value.indexOf("File ")>-1 ? row1.Value.substring(row1.Value.indexOf("File ")+5) : row1.Value
If you run the job again, you will get this.....
ID|00070
Export |D:\PATH\TO\FILE\LOCATION\
I've built a demo job to show you what it looks like. All of the settings I have described.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the best would be to use the following component: https://help.talend.com/r/en-US/8.0/ms-delimited/ms-delimited-scenario

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good call @Balazs Gunics. I completely forgot to consider the tFileOutputMSDelimited. Thanks for stepping in 👍

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you both, I will give tFileOutputMSDelimited a go! (Sorry for not acknowledging sooner, couldn't log in to respond)
