[resolved] tHttpRequest - how to retrieve html content from URL
Hello,
I'm trying to get the HTML content from URL's that I have in a CSV File with the tHttpRequest Component.
My CSV FIle looks like this:
column1 http://www.example.com/item-url-1.html http://www.example.com/item-url-2.html http://www.example.com/item-url-3.html Attached you see 2 screenshots with the job and the settings of the component in talend.
The problem is that I get as result the same URLs and not the HTML from the URLs.
Can anyone tell me what am I missing?
Job done:
Basic settings in Talend:
Thanks,
Lucian
Hi Shong,
the append option box solved the problem. I can't stress it enough how grateful I am to you!
I get now the desired html content for each row but the rows are not assigned to my "sku" column from input file because like you said it's read only
Is there a way I can add the "sku" column from the input csv as a key column on output for each extracted url?
Thank you,
Lucian
Hi
You need to iterate each url read from the source file, and set the URL filed of tHttpRequest with a dynamic variable. For example:
tFileInputDelimied-main(row1)--tFlowToIterate-iterate-tHttpRequest--main-tLogRow
on tHttpRequest, set the URIl field as:
(String)globalMap.get("row1.url")
//url is the column name on tFileInputDelimited.
BR
Shong
Hi Shong,
thank you for your help.
Here is what I done:
Pic1 tHttpRequest:
Pic2 Error:
Pic3-tFlowToIterate
I get a 404 Not Found Error.
I have 2 columns in my csv file:
"sku"
and
"description_long" witch contains the url's.
In tFlowToIterate component I declared the variable "Dnl_Descr_Url" for "description_long" column
Than in tHttpRequest I set URI like:
"http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.description_long"))
and also like:
"http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.Dnl_Descr_Url"))
But I still get "404 Not Found"
Im a newbie on this territory. I'll appreciate your help very much.
Thank you,
Lucian
Hello, I tried: "http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.Dnl_Descr_Url")) and ((String)globalMap.get("row35.Dnl_Descr_Url")) and I still get the same error. If I check the schema in tHttpRequest I see only the ResponseContent column. Shouldn't be there also the "description_long" column? Thank you, Lucian
If I check the schema in tHttpRequest I see only the ResponseContent column. Shouldn't be there also the "description_long" column?
This component has only one column which is read-only.
Can you open the URL in browser normally? If you still have problem, can you please show us an real example data of your CSV file and upload a full screenshot of tFlowToIterate component? From your screenshot, I can't see if the 'the default key/value...' box is checked or not.
Best regards
Shong
Hello Shong,
thank you very much for your answer. Attachend you will find my csv file with the urls. The URLS are working fine when they are opened in browser.
The default/value checkbox wasnt checkek in my first test. I made now 2 tests more with and without the default/key value checked and I still get the same error.
Here are the Screenshots:
test-html-url.rar.rar 1.0 tFlowToIterate - default key/value not checked
1.1Error with tFlowToIterate - default key/value not checked
2.0 tFlowToIterate - default key/value checked
2.1 Error with tFlowToIterate - default key/value checked
Best Regards,
Lucian
Hi Shong,
many thanks for your help.
The job is running perfectly now but in output it will save only the last row. Im my example we had 2 rows and only the last row is being written in output. To be sure of it I tested a new CSV file reading the first 90 rows from it with tSampleRow Component and I get only the last one in output, in this case row 91 because the first is the header.
here is my screenshot:
90 rows executed but only 1 written.
Many Thanks,
Lucian