Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Gourav_King_of_DataLand
Contributor II
Contributor II

Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi All,

Need you expert help.

 

The requirement is to pull all chat data from REST API (one time full data dump) and then pull chat on daily basis.The output is spread across 180K pages with each page giving URL to next and previous (except first page which have only 'nex_url' and last page with have only 'prev_url').

 

So Far I have been able to use the API/URL to extract information from first page first page

 

tRestClient ->tJavaRow->tJsonExtract->tOracleOut

 

How do I modify the job to

1) Pull all data for one time data dump, 180k pages

2) Pull data on daily basis for current day or extract data until the timestamp is current day.

 

Example output from API

 

Page 1 gives
{
    "chats": [

                  all chat related attributes that needs to imported

                 ],
    "count": 179451,
    "next_url": "next_url_here"

}


Page2 gives

{
    "chats": [

                all chat related attributes that needs to imported

                 ],
    "count": 179451
    "prev_url": "previous_url_here"
    "next_url": "next_url_here"

}

Page 3 gives ......next page 

 

 

Labels (5)
79 Replies
Parikhharshal
Creator III
Creator III

@rhall: Sure will do. Still in a learning phase but I am getting a fantastic learning experience from you and other experts here.

Anonymous
Not applicable

Before the code I sent you last time, add this.....

link_header = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_1_HEADERS")).get("Link");
Parikhharshal
Creator III
Creator III

@rhall: Below code is giving me the error:

 

0683p000009M0kS.png

 

String link_header = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_1_HEADERS")).get("Link");

 

 

//Split the String up into different links

String[] str_array = link_header.split(",");

 

//Create an empty "link" variable

String link = null;

 

for(int i = 0; i<str_array.length;i++){

 

String linkString = str_array[i];

 

//Find only the Next link

if(linkString.indexOf("rel=\"next\"")>-1){

//Strip the opening < char

linkString = linkString.substring(linkString.indexOf('<')+1);

//Strip the characters at the end

linkString = linkString.substring(0, linkString.lastIndexOf('>'));

link = linkString;

 

}

}

 

System.out.println(link);

Anonymous
Not applicable

Sorry, I didn't think about this (and can't test it here) so made a mistake. Your System.out converted a List<String> into a String to display it in the terminal. 

 

Your code will need to be amended to use something like this....

java.util.List<String> strList = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_1_HEADERS")).get("Link");

You will then need to iterate over that List to find the values and use some String manipulation similar to what I showed you before.

 

A guide on iterating over lists can be found here: https://stackoverflow.com/questions/18410035/ways-to-iterate-over-a-list-in-java

 

Java String manipulation examples I've already provided and you should be able to extrapolate from there.

 

It is a good idea to learn Java when using Talend. Everything you need to do from here is relatively simple Java. 

Parikhharshal
Creator III
Creator III

@rhall: Just in case I do not want to write java code then can I store the value of Link header in variable and then start creating flow in Talend (for operations like split and get substring etc)?

 

As you mentioned earlier using this 

java.util.List<String> strList = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_2_HEADERS")).get("Link");

 

I got value in strList now. But if I want to use this value for later flow is it possible? How can I do that?

Anonymous
Not applicable

You will need to learn and use Java to become an expert with Talend. While Talend removes a lot of the coding, you still need the understanding to debug effectively and to do some really cool things with it. Java opens A LOT of doors with Talend.

 

In response to extracting the values using Talend components without Java, I don't think you can. You can convert the List to a String using a tConvertType and you can probably extract the values using the tNormalize, but when it comes to removing excess characters there is no real alternative to using Java. Doing all of this with components can cause a job to get quite big very quickly when all you need is something like below....

 

java.util.Iterator<String> it = strList.iterator(); 

while(it.hasNext()){
String linkString = it.next();
//Find only the Next link
if(linkString.indexOf("rel=\"next\"")>-1){
//Strip the opening < char
linkString = linkString.substring(linkString.indexOf('<')+1);
//Strip the characters at the end
linkString = linkString.substring(0, linkString.lastIndexOf('>'));
row1.link = linkString;
}
}

 

Parikhharshal
Creator III
Creator III

@rhall: Couple of things: I tried below code:

 

java.util.List<String> strList = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_2_HEADERS")).get("Link");

 

//System.out.println(strList);

 

//Create an empty "link" variable

 

String link = null;

 

java.util.Iterator<String> it = strList.iterator();

 

while(it.hasNext()){

    String linkString = it.next();

    //Find only the Next link

    if(linkString.indexOf("rel=\"next\"")>-1){

         //Strip the opening < char

         linkString = linkString.substring(linkString.indexOf('<')+1);

         //Strip the characters at the end 

         linkString = linkString.substring(0, linkString.lastIndexOf('>'));

         link = linkString;

    

System.out.println(linkString);

}

 

And it printed entire Link header as it is. I think here it should return only next URL. Is it correct?

 

Apart from this, I thought it is worth trying use tConvertType and see how it can work to gain some exposure and I built job like this:

 

0683p000009M0qB.png

 

For some reason it does not return anything. Am I doing anything wrong here?

 

This is what is mentioned in tConvertType schema.

 

0683p000009M0t9.png

 

And this is what is mentioned in tConvertType.

 

0683p000009M0ov.png

 

Not sure what is missing though.

Anonymous
Not applicable

Your System.out needs to be inside the IF condition and not outside of it. Otherwise it prints every link in the List without it having been processed.

Parikhharshal
Creator III
Creator III

@rhall: I did that and it is cutting the text in print:

 

https://abc.test.instructure.com/api/v1/courses/160/analytics/student_summaries?page=1&per_page=10>; rel="current",<https://abc.test.instructure.com/api/v1/courses/160/analytics/student_summaries?page=2&per_page=10>; rel="next",<https://abc.test.instructure.com/api/v1/courses/160/analytics/student_summaries?page=1&per_page=10>; rel="first",<https://abc.test.instructure.com/api/v1/courses/160/analytics/student_summaries?page=14&per_page=10

 

I  think it is cutting >;rel="last" and also starting "<" before first url for rel=current

 

    if(linkString.indexOf("rel=\"next\"")>-1){

         //Strip the opening < char

         linkString = linkString.substring(linkString.indexOf('<')+1);

         //Strip the characters at the end 

         linkString = linkString.substring(0, linkString.lastIndexOf('>'));

         link = linkString;

         

     System.out.println(linkString);

    

}

Anonymous
Not applicable

You are looking for the next url not the current one. The current one is the url that has been used to get the header. The url that is returned is the next url that needs to be used.