Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
hvanderborg
Contributor III
Contributor III

[resolved] getting destination url

Hi,
Trying to get the destination url of a url like this: http://objects.icecat.biz/objects/mmo-26204092-2358398.html
The url redirects to a pdf file. I did manage to get the file with tfilefetch (allowing redirect) but I just need the destination url (simply the url of the pdf file in this case), not the file itself. Any idea's?
Thanks,
Henry
Labels (2)
1 Solution

Accepted Solutions
hvanderborg
Contributor III
Contributor III
Author

Hi Shong,
Thank you for helping me out, managed to get it working! Only for one url so far... Now I'm trying to get it to work with an input stream (rows come from tExtractXMLField) how should I make it work for all rows? Should I be using tJavaRow instead of tJava?
My code now is:
String url = row15.url;
java.net.HttpURLConnection con = (java.net.HttpURLConnection) new java.net.URL(url).openConnection();
con.setInstanceFollowRedirects(false);
con.connect();
String realURL = con.getHeaderField("Location");
System.out.println(realURL);

The part not yet working is  String url = row15.url;  if I'd replace row15.url with " http://myurl.com" then the code is working for that url.
Thanks,
Henry

View solution in original post

8 Replies
Anonymous
Not applicable

Hi 
There is no a component can be used to get the real url behind a redirect URL right now, however, you can hard code on tJava component to get it, refer to the following pages:
http://www.programminglogic.com/how-to-find-the-real-url-behind-a-redirect-in-java/
http://stackoverflow.com/questions/2659000/java-how-to-find-the-redirected-url-of-a-url
Best regards
Shong
hvanderborg
Contributor III
Contributor III
Author

Hi Shong,
Thank you for helping me out, managed to get it working! Only for one url so far... Now I'm trying to get it to work with an input stream (rows come from tExtractXMLField) how should I make it work for all rows? Should I be using tJavaRow instead of tJava?
My code now is:
String url = row15.url;
java.net.HttpURLConnection con = (java.net.HttpURLConnection) new java.net.URL(url).openConnection();
con.setInstanceFollowRedirects(false);
con.connect();
String realURL = con.getHeaderField("Location");
System.out.println(realURL);

The part not yet working is  String url = row15.url;  if I'd replace row15.url with " http://myurl.com" then the code is working for that url.
Thanks,
Henry
Anonymous
Not applicable

Yes, if you want access the input data flow, use tJavaRow to replace tJava and change your code to:
String url = input_row.url;
java.net.HttpURLConnection con = (java.net.HttpURLConnection) new java.net.URL(url).openConnection();
con.setInstanceFollowRedirects(false);
con.connect();
String realURL = con.getHeaderField("Location");
System.out.println(realURL);

Shong
hvanderborg
Contributor III
Contributor III
Author

Great thanks Shong, that works, it seems I run into 2 small challenges though:
1) IF the input_row.url ends with .html, THEN it needs to perform the code to get the destination url.  ELSE output_row.url = input_row.url). Any idea how I should include such a statement in tJavaRow? FYI my current code is:
String url = input_row.url;
System.out.println(url);
java.net.HttpURLConnection con = (java.net.HttpURLConnection) new java.net.URL(url).openConnection();
con.setInstanceFollowRedirects(false);
con.connect();
String realURL = con.getHeaderField("Location");
System.out.println(realURL);
output_row.url = realURL;

2) after a numer of connections I get a timeout. I suspect the server on the other side does not accept more than x connections. Could I be missing something in the code above to properly close each connection before going to fetch the next destination url?
hvanderborg
Contributor III
Contributor III
Author

Regarding 1) I don't know whether it's best practice, but it seems I solved including the if statement like this:
String url = input_row.url;
System.out.println(url);
if (StringHandling.INDEX(url,".html")>0){
java.net.HttpURLConnection con = (java.net.HttpURLConnection) new java.net.URL(url).openConnection();
con.setInstanceFollowRedirects(false);
con.connect();
String realURL = con.getHeaderField("Location");
System.out.println(realURL);
output_row.url = realURL;
}
else {input_row.url = output_row.url;}

Still open to solve: solving the connection timeout (after x connections it won't accept more)
hvanderborg
Contributor III
Contributor III
Author

Regarding 2) it seems addding con.disconnect(); solves the connection issue. I don't know whether it is best practice to disconnect each time but it works
//Code generated according to input schema and output schema
output_row.fk_product_id = input_row.fk_product_id;
output_row.fk_supplier_id = input_row.fk_supplier_id;
output_row.id_by_datasupplier = input_row.id_by_datasupplier;
//
String url = input_row.url;
System.out.println(url);
if (StringHandling.INDEX(url,".html")>0){
java.net.HttpURLConnection con = (java.net.HttpURLConnection) new java.net.URL(url).openConnection();
con.setInstanceFollowRedirects(false);
con.connect();
String realURL = con.getHeaderField("Location");
System.out.println(realURL);
output_row.url = realURL;
con.disconnect();
}
else {input_row.url = output_row.url;}

Shong, or if anyone likes to comment on this, would love to hear, otherwise I'll mark ik resolved later today
Anonymous
Not applicable

Hi 
1) It is OK with this method, you can also use String.endsWith(
) method to check if the url ends with .html.
String url = input_row.url;
System.out.println(url);
if (url.endsWith(".html")){
..}

2) You need to close the connection at the end, for example:
if (StringHandling.INDEX(url,".html")>0){
java.net.HttpURLConnection con = (java.net.HttpURLConnection) new java.net.URL(url).openConnection();
con.setInstanceFollowRedirects(false);
con.connect();
String realURL = con.getHeaderField("Location");
System.out.println(realURL);
con.disConnect();
output_row.url = realURL;
}

Shong
hvanderborg
Contributor III
Contributor III
Author

Many thanks. Used endsWith now and kept disconnect()  (without capital C). I'll mark it resolved now