<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to extract data from a website? in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255566#M38231</link>
    <description>I found another helpful thing for this: 
&lt;BR /&gt; 
&lt;A href="http://www.iopus.com/imacros/firefox/?ref=fxmoz" rel="nofollow noopener noreferrer"&gt;http://www.iopus.com/imacros/firefox/?ref=fxmoz&lt;/A&gt; 
&lt;BR /&gt;Amazing tool to automate the web, even data extraction works fine. 
&lt;BR /&gt;One could combine the output which is e.g. Excel with Talend to get it into another database.</description>
    <pubDate>Thu, 14 May 2009 10:51:51 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2009-05-14T10:51:51Z</dc:date>
    <item>
      <title>How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255558#M38223</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;i´ve got two websites. One Website wich supports SOAP, imports and so on.&lt;BR /&gt;Another Website wich keeps about 7000 html documents with an identical format with information in tables on it.&lt;BR /&gt;Now, with the relaunch, I have to transport content from the 7000 files to a database / CMS / SOAP.&lt;BR /&gt;I saw, that talend is able to connect to http. &lt;BR /&gt;Can I also extract data from html tables?&lt;BR /&gt;Thank you.&lt;BR /&gt;Bye, Chris&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 14:24:41 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255558#M38223</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T14:24:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255559#M38224</link>
      <description>Ithink that There isn't any way to extract data from a html table but if you have only table you may use a regular expression</description>
      <pubDate>Tue, 01 Apr 2008 11:48:46 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255559#M38224</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2008-04-01T11:48:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255560#M38225</link>
      <description>Hello Chris,
&lt;BR /&gt;as Olivier wrote, there is no special component. I had the same problem and it ends up in a tJavaRow with many regex. But that depends on your html structure. I've experimented a little bit with html2xml converter. If you search in google you should find different tools (including open source). At the end I could'nt use them because my input was very "unwell formed".
&lt;BR /&gt;If you found a solution please give a us a feedback. 
&lt;BR /&gt;Bye
&lt;BR /&gt;Volker</description>
      <pubDate>Tue, 01 Apr 2008 21:58:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255560#M38225</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2008-04-01T21:58:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255561#M38226</link>
      <description>I have written an OpenSource function for converting bad HTML to well-formed XML (&lt;A href="http://sourceforge.net/projects/light-html2xml" rel="nofollow noopener noreferrer"&gt;http://sourceforge.net/projects/light-html2xml&lt;/A&gt;) and I would appreciate to test it with your input.&lt;BR /&gt;It is a single-pass automat and it does not need specific objects. It is not yet written in Java but in C# and in PHP5 (I will soon rewrite it in Java, especially if you're interested in...).</description>
      <pubDate>Thu, 03 Apr 2008 09:00:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255561#M38226</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2008-04-03T09:00:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255562#M38227</link>
      <description>Yes I think that it would be a really good idea to write it in java then I will create a specific talend component to perform this action</description>
      <pubDate>Thu, 03 Apr 2008 09:27:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255562#M38227</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2008-04-03T09:27:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255563#M38228</link>
      <description>Hi,
&lt;BR /&gt;We use for internal stats some Talend jobs using 
&lt;A href="http://cpan.uwinnipeg.ca/module/HTML::TokeParser" rel="nofollow noopener noreferrer"&gt;http://cpan.uwinnipeg.ca/module/HTML::TokeParser&lt;/A&gt; in tPerl/tPerlRow. We may push on the stack a new component if you need it.
&lt;BR /&gt;Hope this helps</description>
      <pubDate>Thu, 03 Apr 2008 13:25:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255563#M38228</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2008-04-03T13:25:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255564#M38229</link>
      <description>The Java version of the html2xml function I have written is now downloadable at &lt;A href="http://sourceforge.net/projects/light-html2xml" rel="nofollow noopener noreferrer"&gt;http://sourceforge.net/projects/light-html2xml&lt;/A&gt;&lt;BR /&gt;Please send me your comments and remarks about it so I will fix bugs.</description>
      <pubDate>Fri, 04 Apr 2008 11:20:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255564#M38229</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2008-04-04T11:20:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255565#M38230</link>
      <description>Yes u can extract all data from 7000 pages. i m also working on this.</description>
      <pubDate>Thu, 14 May 2009 10:02:48 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255565#M38230</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2009-05-14T10:02:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255566#M38231</link>
      <description>I found another helpful thing for this: 
&lt;BR /&gt; 
&lt;A href="http://www.iopus.com/imacros/firefox/?ref=fxmoz" rel="nofollow noopener noreferrer"&gt;http://www.iopus.com/imacros/firefox/?ref=fxmoz&lt;/A&gt; 
&lt;BR /&gt;Amazing tool to automate the web, even data extraction works fine. 
&lt;BR /&gt;One could combine the output which is e.g. Excel with Talend to get it into another database.</description>
      <pubDate>Thu, 14 May 2009 10:51:51 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255566#M38231</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2009-05-14T10:51:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255567#M38232</link>
      <description>User vder software, extract data from Amazon.com output to xml format. view screenshot: 
&lt;A href="http://binhgiang.sourceforge.net/xmlalbum/slides/vietspider%20xml%20list%20detail%201.html" rel="nofollow noopener noreferrer"&gt;http://binhgiang.sourceforge.net/xmlalbum/slides/vietspider%20xml%20list%20detail%201.html&lt;/A&gt;
&lt;BR /&gt;and download from: 
&lt;A href="http://binhgiang.sourceforge.net/site/download.jsp" rel="nofollow noopener noreferrer"&gt;http://binhgiang.sourceforge.net/site/download.jsp&lt;/A&gt;</description>
      <pubDate>Fri, 24 Jul 2009 05:19:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255567#M38232</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2009-07-24T05:19:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255568#M38233</link>
      <description>I would suggest Automation Anywhere. Great tool for web data extraction and automating any task. Free Trial available for download at: 
&lt;BR /&gt; 
&lt;A href="http://www.automationanywhere.com/download/freeTrial.htm" rel="nofollow noopener noreferrer"&gt;http://www.automationanywhere.com/download/freeTrial.htm&lt;/A&gt; 
&lt;BR /&gt;Just try it out! 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MA9p.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/138034i5F552429DA646D6F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MA9p.png" alt="0683p000009MA9p.png" /&gt;&lt;/span&gt;</description>
      <pubDate>Tue, 15 Dec 2009 04:44:36 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255568#M38233</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2009-12-15T04:44:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255569#M38234</link>
      <description>You can also try tHTTPTableInput. This component has been designed for extracting data directly from HTML Pages. 
&lt;BR /&gt; 
&lt;A href="http://www.talendforge.org/exchange/tos/extension_view.php?eid=72" rel="nofollow noopener noreferrer"&gt;http://www.talendforge.org/exchange/tos/extension_view.php?eid=72&lt;/A&gt; 
&lt;BR /&gt;Regards 
&lt;BR /&gt;Martin</description>
      <pubDate>Mon, 04 Jan 2010 15:11:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255569#M38234</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2010-01-04T15:11:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255570#M38235</link>
      <description>Have you ever wonder if you can have full contents from your desired website into a single Excel Document?
&lt;BR /&gt;If so, I have the solution for you at fairly cheaper price.
&lt;BR /&gt;I can extract most of the website data and compile it in a single ms-excel 2003 format within just few days.
&lt;BR /&gt;It can be any website, from a simple site to complex sites like b2b portals or whatever you can come up with.
&lt;BR /&gt;Contact me with your website and requirements.
&lt;BR /&gt;Regards,
&lt;BR /&gt;Janib Soomro
&lt;BR /&gt;janib4all@hotmail.com</description>
      <pubDate>Sun, 01 Aug 2010 07:42:37 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255570#M38235</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2010-08-01T07:42:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255571#M38236</link>
      <description>I can make it for you. site.downloader@gmail.com</description>
      <pubDate>Tue, 07 Dec 2010 21:12:06 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255571#M38236</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2010-12-07T21:12:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255572#M38237</link>
      <description>Talend, I am having trouble in getting HTML table data to excel using talend v4.2.2. I saw there is a component thttptable for previous version.&lt;BR /&gt;Can you help in this regard?</description>
      <pubDate>Fri, 02 Sep 2011 19:53:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255572#M38237</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2011-09-02T19:53:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255573#M38238</link>
      <description>Hello Honed, 
&lt;BR /&gt;I'm having the same problem, when i try to catch data from the html page that cames with the component everything works fine, but this page is very simple does not have any divs, or blockquotes, is structured only using tables, when i try to use a page that uses more html tags, like blockquotes, is like tHTTPTableInput does not recognize the Tables, so it launch a 
&lt;BR /&gt;"Exception in component tHTTPTableInput_1 java.lang.ArrayIndexOutOfBoundsException:" 
&lt;BR /&gt;Does anyone here has the same problem or know how to solve this? 
&lt;BR /&gt; 
&lt;BR /&gt;Thanks</description>
      <pubDate>Wed, 30 Nov 2011 12:28:29 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255573#M38238</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2011-11-30T12:28:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255574#M38239</link>
      <description>Hello,&lt;BR /&gt;Did you try DataCrops web extraction software tool?&amp;nbsp;&lt;BR /&gt;DataCrops tool allows you to extract data from any website and provides it to you in proper structure. This business data really helps you to generate leads for your business as well as you can easily analyse this data and take prominent decision for your business !</description>
      <pubDate>Fri, 25 Jul 2014 09:29:11 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255574#M38239</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2014-07-25T09:29:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255575#M38240</link>
      <description>Try this for free download trial version</description>
      <pubDate>Fri, 25 Jul 2014 10:08:17 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255575#M38240</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-07-25T10:08:17Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255576#M38241</link>
      <description>You can use Talend for this. It needs a little Java coding, but it is more than possible. I have written a simple tutorial 
&lt;A href="http://www.rilhia.com/node/39" target="_blank" rel="nofollow noopener noreferrer"&gt;here&lt;/A&gt;. It comes with all of the source code in Talend v5.5.1 format.</description>
      <pubDate>Fri, 27 Mar 2015 01:23:26 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255576#M38241</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2015-03-27T01:23:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract data from a website?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255577#M38242</link>
      <description>Recently I faced some problem to extract data but I found data extractor software from webcontentextractor.com, it helped me a lot to extract data. When I used this software it provided me excellent support and saved a lot of time and effort.</description>
      <pubDate>Tue, 28 Apr 2015 16:38:20 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-extract-data-from-a-website/m-p/2255577#M38242</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2015-04-28T16:38:20Z</dc:date>
    </item>
  </channel>
</rss>

