Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Web Scraping (Newbie)

Hi
There is a web site that I use regularly that will present a table based on search criteria. I know how to structure the URI to return the page with the table of data on it. The web site, however requires that I log in first.
To automate this I am trying to use the tFileFetch component. I have set the protocol to "http", put in the URI (that I know works as I've tested it in a browser), set the Destination directory and filename, un-selected the POST Method and Die on error check boxes. I have then set the Need authentication box to checked and entered my username/pwd combination (confirmed that I've entered them correctly).
The saved output from this is a file with "<h1>Incorrect access</h1> You are not logged in." - a total of 48 bytes.
I have tried this in 4.1.1 and now in 4.2 and I get the same results. In 4.2 I tried putting the tHttpRequest component in to access the web site's login form first and then run the tFileFetch (major fail).
I'm stuck! I watched the Web Scraping webinar this afternoon and it all looked so easy 😞
The normal sequence I go through is to go to the web site's home page, click on the "Log In" link, log in, then go to the search page, search and then I get my table. Any ideas on how to automate this with TOS would be gratefully received.
TIA
Stephen
Labels (2)
1 Reply
Anonymous
Not applicable
Author

This is maybe a bit of a late response, but I have a tutorial on this here.