Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Streamlining user types in Qlik Cloud capacity-based subscriptions: Read the Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] tHttpRequest - how to retrieve html content from URL

Hello,
I'm trying to get the HTML content from URL's that I have in a CSV File with the tHttpRequest Component.
My CSV FIle looks like this:
column1
http://www.example.com/item-url-1.html

http://www.example.com/item-url-2.html
http://www.example.com/item-url-3.html
Attached you see 2 screenshots with the job and the settings of the component in talend.
The problem is that I get as result the same URLs and not the HTML from the URLs.
Can anyone tell me what am I missing?

Job done:
0683p000009MD0L.pngBasic settings in Talend:
0683p000009MD2u.png
Thanks,
Lucian
Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hi Shong,
the append option box solved the problem. I can't stress it enough how grateful I am to you!
I get now the desired html content for each row but the rows are not assigned to my "sku" column from input file because like you said it's read only
Is there a way I can add the "sku" column from the input csv as a key column on output for each extracted url?
Thank you,
Lucian

View solution in original post

14 Replies
Anonymous
Not applicable
Author

Hi 
You need to iterate each url read from the source file, and set the URL filed of tHttpRequest with a dynamic variable. For example:
tFileInputDelimied-main(row1)--tFlowToIterate-iterate-tHttpRequest--main-tLogRow
on tHttpRequest, set the URIl field as:
(String)globalMap.get("row1.url")
//url is the column name on tFileInputDelimited.
BR
Shong
Anonymous
Not applicable
Author

Hi Shong,
thank you for your help.
Here is what I done:
Pic1 tHttpRequest:
0683p000009MD2z.pngPic2 Error:
0683p000009MCeC.png
Pic3-tFlowToIterate
0683p000009MD34.png
I get a 404 Not Found Error.
I have 2 columns in my csv file:
"sku"
and
"description_long" witch contains the url's.
In tFlowToIterate component I declared the variable "Dnl_Descr_Url" for "description_long" column
Than in tHttpRequest I set URI like: 
"http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.description_long"))
and also like:
"http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.Dnl_Descr_Url"))
But I still get "404 Not Found"
Im a newbie on this territory. I'll appreciate your help very much.
Thank you,
Lucian
Anonymous
Not applicable
Author

Hi
Set the URI with global variable as:
((String)globalMap.get("row35.Dnl_Descr_Url"))
Anonymous
Not applicable
Author

Hello,
I tried:
"http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.Dnl_Descr_Url"))
and
((String)globalMap.get("row35.Dnl_Descr_Url"))
and I still get the same error.
If I check the schema in tHttpRequest I see only the ResponseContent column. Shouldn't be there also the "description_long" column?
Thank you,
Lucian
Anonymous
Not applicable
Author

Hi Shong,
can you please check if Im doing everything as you mentioned ? I still get that "404 Not Found" error.
Thank you, 
Lucian
Anonymous
Not applicable
Author

Hi 
If I check the schema in tHttpRequest I see only the ResponseContent column. Shouldn't be there also the "description_long" column?

This component has only one column which is read-only. 
Can you open the URL in browser normally?  If you still have problem, can you please show us an real example data of your CSV file and upload a full screenshot of tFlowToIterate component? From your screenshot, I can't see if the 'the default key/value...' box is checked or not. 
Best regards
Shong
Anonymous
Not applicable
Author

Hello Shong,
thank you very much for your answer. Attachend you will find my csv file with the urls. The URLS are working fine when they are opened in browser.
The default/value checkbox wasnt checkek in my first test. I made now 2 tests more with and without the default/key value checked and I still get the same error.
Here are the Screenshots:
test-html-url.rar.rar
1.0 tFlowToIterate - default key/value not checked
0683p000009MCXK.png1.1Error with tFlowToIterate - default key/value not checked
0683p000009MCz5.png2.0 tFlowToIterate - default key/value checked
0683p000009MCOY.png
2.1 Error with tFlowToIterate - default key/value checked
0683p000009MD06.png
Best Regards,
Lucian
Anonymous
Not applicable
Author

Hi  
I tested your example URL and it works fine, it uses get method on tHttpRequest to send the request. see my screenshots.
0683p000009MCz6.png 0683p000009MD39.png 0683p000009MD3E.png
BR
Shong
Anonymous
Not applicable
Author

Hi Shong,
many thanks for your help. 
The job is running perfectly now but in output it will save only the last row. Im my example we had 2 rows and only the last row is being written in output. To be sure of it I tested a new CSV file reading the first 90 rows from it with tSampleRow Component and I get only the last one in output, in this case row 91 because the first is the header.
here is my screenshot:
0683p000009MD3J.png90 rows executed but only 1 written.
Many Thanks,
Lucian