Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
hami1
Creator
Creator

How to scrap the Website content and check for the url existance

Hi Guys,

I came across a scenario where I need to search for the availability of url's (PPC ad) within a website using Talend. I have used tHttpRequest Component to fetch the contents of Website and was able to get the html information into the flatfile. Here I need to check out the corresponding urls availability in the flatfile.

I am using Talend Open Studio 6.3 Version, how can I achieve this scenario.

 

Thanks,

skh.

 

 

 

 

 

Labels (4)
1 Solution

Accepted Solutions
hami1
Creator
Creator
Author

 

I used tHttpRequest Component to scrawl the code of the website, later used Java Code to check the required url existence.

Thanks,

Hameed.

 

View solution in original post

3 Replies
lojdr
Creator II
Creator II

Hello,

HTML is a specific version of XML so use XML components in Talend to filter required information. E.g tXMLMap or tExtractXMLField.

Regards
Lojdr
hami1
Creator
Creator
Author

Hi ,

 

But in my scenario am scrapping entire website code which is in html-format and loading into the flatfile.

I think html is differ from XML, I will check with the xml components and let you know.

 

Thanks,

skh

hami1
Creator
Creator
Author

 

I used tHttpRequest Component to scrawl the code of the website, later used Java Code to check the required url existence.

Thanks,

Hameed.