Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
hami1
Creator
Creator

How to scrap the Website content and check for the url existance

Hi Guys,

I came across a scenario where I need to search for the availability of url's (PPC ad) within a website using Talend. I have used tHttpRequest Component to fetch the contents of Website and was able to get the html information into the flatfile. Here I need to check out the corresponding urls availability in the flatfile.

I am using Talend Open Studio 6.3 Version, how can I achieve this scenario.

 

Thanks,

skh.

 

 

 

 

 

Labels (4)
1 Solution

Accepted Solutions
hami1
Creator
Creator
Author

 

I used tHttpRequest Component to scrawl the code of the website, later used Java Code to check the required url existence.

Thanks,

Hameed.

 

View solution in original post

3 Replies
lojdr
Creator II
Creator II

Hello,

HTML is a specific version of XML so use XML components in Talend to filter required information. E.g tXMLMap or tExtractXMLField.

Regards
Lojdr
hami1
Creator
Creator
Author

Hi ,

 

But in my scenario am scrapping entire website code which is in html-format and loading into the flatfile.

I think html is differ from XML, I will check with the xml components and let you know.

 

Thanks,

skh

hami1
Creator
Creator
Author

 

I used tHttpRequest Component to scrawl the code of the website, later used Java Code to check the required url existence.

Thanks,

Hameed.