Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Web Scraping

Hi everyone,
I have the URL of a web page. In this, there are some links. For each link, I have to scrape all its content.
I want to make it with TOS. It's the first time that I make something like that.
Have I need to use a script, for example in Python, to combine with a talend job? Or can I do everything through specific talend components (so without scripts)? Which components have I to use?
Thanks all
Labels (3)
1 Reply
Anonymous
Not applicable
Author

Hello 
Take a look at tHttpRequest component, this component can be used to send a http request to the serve and get the page content from the URL, and then use regular expression or tExtractXMLFields component to extract all links from the response, finally, iterate link one by one. For example:
tHttpRequest--main--tExtractXMLField-main-tFlowToIterate--iterate--tHttpRequest--main--tLogRow
Best regards
Shong