Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Gourav_King_of_DataLand
Contributor II
Contributor II

Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi All,

Need you expert help.

 

The requirement is to pull all chat data from REST API (one time full data dump) and then pull chat on daily basis.The output is spread across 180K pages with each page giving URL to next and previous (except first page which have only 'nex_url' and last page with have only 'prev_url').

 

So Far I have been able to use the API/URL to extract information from first page first page

 

tRestClient ->tJavaRow->tJsonExtract->tOracleOut

 

How do I modify the job to

1) Pull all data for one time data dump, 180k pages

2) Pull data on daily basis for current day or extract data until the timestamp is current day.

 

Example output from API

 

Page 1 gives
{
    "chats": [

                  all chat related attributes that needs to imported

                 ],
    "count": 179451,
    "next_url": "next_url_here"

}


Page2 gives

{
    "chats": [

                all chat related attributes that needs to imported

                 ],
    "count": 179451
    "prev_url": "previous_url_here"
    "next_url": "next_url_here"

}

Page 3 gives ......next page 

 

 

Labels (5)
79 Replies
Anonymous
Not applicable

I suspect you just need to get access to the "next" link in this case. That should handle ALL of the logic apart from the course_id bit.

Parikhharshal
Creator III
Creator III

@rhall: Are you talking about this link? What do you mean by 8 posts? 

 

https://community.talend.com/t5/Design-and-Development/Logging-Integration-with-Sentry-io/m-p/134832...

 

But is this how I can access Link information too?

Anonymous
Not applicable

Add this code to your tJavaFlex (Main Code) and let me know what it does....

System.out.println(((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_1_HEADERS")).get("Link"));

My assumption is the your TRestClient is called tRestClient_1. If not, change the code above to suit your component name

Anonymous
Not applicable

This code will do it....

//Split the String up into different links
String[] str_array = link_header.split(",");

//Create an empty "link" variable
String link = null;

for(int i = 0; i<str_array.length;i++){

	String linkString = str_array[i];
	
	//Find only the Next link
	if(linkString.indexOf("rel=\"next\"")>-1){
		//Strip the opening < char
		linkString = linkString.substring(linkString.indexOf('<')+1);
		//Strip the characters at the end
		linkString = linkString.substring(0, linkString.lastIndexOf('>'));
		link = linkString;

	}
}

System.out.println(link);

You need to assign your link header value to the "link_header" variable and then the code above will extract the next link. If there is no next link, the "link" variable will be null.

 

 

Parikhharshal
Creator III
Creator III

@rhall: Does it mean that I get entire next URL from Link always like this and keep feeding to my logic?

 

So initially I set my initial URL and after textractJsonfields I put next url into globalmap and run the loop?

Anonymous
Not applicable

Yes, I think that will work for you

Parikhharshal
Creator III
Creator III

@rhall: You are legend mate and I must say you are great! Thanks a lot for your help! Really appreciated! You have just saved me. 

Anonymous
Not applicable

No problem. Just remember to pay it back on the forum when you see a question that you can answer when you get a bit more experience 🙂

Parikhharshal
Creator III
Creator III

@rhall: Just to confirm, for assigning my link header value to link_header variable, this is how I will have to do it right?

 

((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_1_HEADERS")).get("Link");

globalMap.put("link_header",""trestClient_1_HEADERS")

 

Not sure if it is right.