<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Web Scraping (Newbie) in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Web-Scraping-Newbie/m-p/2276410#M52504</link>
    <description>Hi 
&lt;BR /&gt;There is a web site that I use regularly that will present a table based on search criteria. I know how to structure the URI to return the page with the table of data on it. The web site, however requires that I log in first. 
&lt;BR /&gt;To automate this I am trying to use the tFileFetch component. I have set the protocol to "http", put in the URI (that I know works as I've tested it in a browser), set the Destination directory and filename, un-selected the POST Method and Die on error check boxes. I have then set the Need authentication box to checked and entered my username/pwd combination (confirmed that I've entered them correctly). 
&lt;BR /&gt;The saved output from this is a file with "&amp;lt;h1&amp;gt;Incorrect access&amp;lt;/h1&amp;gt; You are not logged in." - a total of 48 bytes. 
&lt;BR /&gt;I have tried this in 4.1.1 and now in 4.2 and I get the same results. In 4.2 I tried putting the tHttpRequest component in to access the web site's login form first and then run the tFileFetch (major fail). 
&lt;BR /&gt;I'm stuck! I watched the Web Scraping webinar this afternoon and it all looked so easy &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; 
&lt;BR /&gt;The normal sequence I go through is to go to the web site's home page, click on the "Log In" link, log in, then go to the search page, search and then I get my table. Any ideas on how to automate this with TOS would be gratefully received. 
&lt;BR /&gt;TIA 
&lt;BR /&gt;Stephen</description>
    <pubDate>Sat, 16 Nov 2024 13:10:38 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-11-16T13:10:38Z</dc:date>
    <item>
      <title>Web Scraping (Newbie)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Web-Scraping-Newbie/m-p/2276410#M52504</link>
      <description>Hi 
&lt;BR /&gt;There is a web site that I use regularly that will present a table based on search criteria. I know how to structure the URI to return the page with the table of data on it. The web site, however requires that I log in first. 
&lt;BR /&gt;To automate this I am trying to use the tFileFetch component. I have set the protocol to "http", put in the URI (that I know works as I've tested it in a browser), set the Destination directory and filename, un-selected the POST Method and Die on error check boxes. I have then set the Need authentication box to checked and entered my username/pwd combination (confirmed that I've entered them correctly). 
&lt;BR /&gt;The saved output from this is a file with "&amp;lt;h1&amp;gt;Incorrect access&amp;lt;/h1&amp;gt; You are not logged in." - a total of 48 bytes. 
&lt;BR /&gt;I have tried this in 4.1.1 and now in 4.2 and I get the same results. In 4.2 I tried putting the tHttpRequest component in to access the web site's login form first and then run the tFileFetch (major fail). 
&lt;BR /&gt;I'm stuck! I watched the Web Scraping webinar this afternoon and it all looked so easy &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; 
&lt;BR /&gt;The normal sequence I go through is to go to the web site's home page, click on the "Log In" link, log in, then go to the search page, search and then I get my table. Any ideas on how to automate this with TOS would be gratefully received. 
&lt;BR /&gt;TIA 
&lt;BR /&gt;Stephen</description>
      <pubDate>Sat, 16 Nov 2024 13:10:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Web-Scraping-Newbie/m-p/2276410#M52504</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T13:10:38Z</dc:date>
    </item>
    <item>
      <title>Re: Web Scraping (Newbie)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Web-Scraping-Newbie/m-p/2276411#M52505</link>
      <description>This is maybe a bit of a late response, but I have a tutorial on this &lt;A href="http://www.rilhia.com/node/39" target="_blank" rel="nofollow noopener noreferrer"&gt;here&lt;/A&gt;.</description>
      <pubDate>Fri, 27 Mar 2015 01:16:37 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Web-Scraping-Newbie/m-p/2276411#M52505</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2015-03-27T01:16:37Z</dc:date>
    </item>
  </channel>
</rss>

