Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to extract data from a website?

Hi,
i´ve got two websites. One Website wich supports SOAP, imports and so on.
Another Website wich keeps about 7000 html documents with an identical format with information in tables on it.
Now, with the relaunch, I have to transport content from the 7000 files to a database / CMS / SOAP.
I saw, that talend is able to connect to http.
Can I also extract data from html tables?
Thank you.
Bye, Chris

Labels (3)
20 Replies
_AnonymousUser
Specialist III
Specialist III

I would suggest Automation Anywhere. Great tool for web data extraction and automating any task. Free Trial available for download at:
http://www.automationanywhere.com/download/freeTrial.htm
Just try it out! 0683p000009MA9p.png
Anonymous
Not applicable
Author

You can also try tHTTPTableInput. This component has been designed for extracting data directly from HTML Pages.
http://www.talendforge.org/exchange/tos/extension_view.php?eid=72
Regards
Martin
_AnonymousUser
Specialist III
Specialist III

Have you ever wonder if you can have full contents from your desired website into a single Excel Document?
If so, I have the solution for you at fairly cheaper price.
I can extract most of the website data and compile it in a single ms-excel 2003 format within just few days.
It can be any website, from a simple site to complex sites like b2b portals or whatever you can come up with.
Contact me with your website and requirements.
Regards,
Janib Soomro
janib4all@hotmail.com
_AnonymousUser
Specialist III
Specialist III

I can make it for you. site.downloader@gmail.com
_AnonymousUser
Specialist III
Specialist III

Talend, I am having trouble in getting HTML table data to excel using talend v4.2.2. I saw there is a component thttptable for previous version.
Can you help in this regard?
Anonymous
Not applicable
Author

Hello Honed,
I'm having the same problem, when i try to catch data from the html page that cames with the component everything works fine, but this page is very simple does not have any divs, or blockquotes, is structured only using tables, when i try to use a page that uses more html tags, like blockquotes, is like tHTTPTableInput does not recognize the Tables, so it launch a
"Exception in component tHTTPTableInput_1 java.lang.ArrayIndexOutOfBoundsException:"
Does anyone here has the same problem or know how to solve this?

Thanks
_AnonymousUser
Specialist III
Specialist III

Hello,
Did you try DataCrops web extraction software tool? 
DataCrops tool allows you to extract data from any website and provides it to you in proper structure. This business data really helps you to generate leads for your business as well as you can easily analyse this data and take prominent decision for your business !
Anonymous
Not applicable
Author

Try this for free download trial version
Anonymous
Not applicable
Author

You can use Talend for this. It needs a little Java coding, but it is more than possible. I have written a simple tutorial here. It comes with all of the source code in Talend v5.5.1 format.
Anonymous
Not applicable
Author

Recently I faced some problem to extract data but I found data extractor software from webcontentextractor.com, it helped me a lot to extract data. When I used this software it provided me excellent support and saved a lot of time and effort.