topic Re: Extract web data into Talend in Talend Data Catalog

Extract web data into Talend

MaksymU — Fri, 15 Nov 2024 23:50:27 GMT

Hello All,

I'm a very new user of the Talend and my task is to extract data from the link below and use it as input data (metadata?) in Talend jobs.

https://www.accessdata.fda.gov/cder/ndctext.zip

The challenge is the data is pulled into delimited files in the .txt format that are packed in the zip files. The link is permanent, and data updates daily.

Thus, I have three main questions:

Can Talend pull the data from the website regularly?

Can Talend configure a website in the Metadata as a data source?

Can Talend extract data in the format I describe?

Thank you in advance! Please feel free to ask me any questions to clarify my request.

Re: Extract web data into Talend

Anonymous — Thu, 19 Aug 2021 03:52:11 GMT

1 If the requirement can be achieved by a Talend Job, schedule a job to run regularly.

2 No, try to download the file via tfilefetch or tHttprequest component.

3 Unzip the zip using tFileUnarchive and then read the file using tFileInputDelimited.

Regards

Shong

Re: Extract web data into Talend

MaksymU — Thu, 19 Aug 2021 08:17:53 GMT

Thank you, Shong!