Solved: Iterative Data extraction (Pagination and Polling)... - Page 7 - Qlik Community

Gourav_King_of_DataLand · ‎2018-08-22

Hi All,

Need you expert help.

The requirement is to pull all chat data from REST API (one time full data dump) and then pull chat on daily basis.The output is spread across 180K pages with each page giving URL to next and previous (except first page which have only 'nex_url' and last page with have only 'prev_url').

So Far I have been able to use the API/URL to extract information from first page first page

tRestClient ->tJavaRow->tJsonExtract->tOracleOut

How do I modify the job to

1) Pull all data for one time data dump, 180k pages

2) Pull data on daily basis for current day or extract data until the timestamp is current day.

Example output from API

Page 1 gives
{
"chats": [

all chat related attributes that needs to imported

],
"count": 179451,
"next_url": "next_url_here"

}

Page2 gives

{
"chats": [

all chat related attributes that needs to imported

],
"count": 179451
"prev_url": "previous_url_here"
"next_url": "next_url_here"

}

Page 3 gives ......next page

Parikhharshal · ‎2018-11-05

@rhall: Sure will do. Still in a learning phase but I am getting a fantastic learning experience from you and other experts here.

Anonymous · ‎2018-11-05

Before the code I sent you last time, add this.....

link_header = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_1_HEADERS")).get("Link");

Parikhharshal · ‎2018-11-05

@rhall: Below code is giving me the error:

String link_header = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_1_HEADERS")).get("Link");

//Split the String up into different links

String[] str_array = link_header.split(",");

//Create an empty "link" variable

String link = null;

for(int i = 0; i<str_array.length;i++){

String linkString = str_array[i];

//Find only the Next link

if(linkString.indexOf("rel=\"next\"")>-1){

//Strip the opening < char

linkString = linkString.substring(linkString.indexOf('<')+1);

//Strip the characters at the end

linkString = linkString.substring(0, linkString.lastIndexOf('>'));

link = linkString;

}

System.out.println(link);

Anonymous · ‎2018-11-05

Sorry, I didn't think about this (and can't test it here) so made a mistake. Your System.out converted a List<String> into a String to display it in the terminal.

Your code will need to be amended to use something like this....

java.util.List<String> strList = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_1_HEADERS")).get("Link");

You will then need to iterate over that List to find the values and use some String manipulation similar to what I showed you before.

A guide on iterating over lists can be found here: https://stackoverflow.com/questions/18410035/ways-to-iterate-over-a-list-in-java

Java String manipulation examples I've already provided and you should be able to extrapolate from there.

It is a good idea to learn Java when using Talend. Everything you need to do from here is relatively simple Java.

Parikhharshal · ‎2018-11-05

@rhall: Just in case I do not want to write java code then can I store the value of Link header in variable and then start creating flow in Talend (for operations like split and get substring etc)?

As you mentioned earlier using this

java.util.List<String> strList = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_2_HEADERS")).get("Link");

I got value in strList now. But if I want to use this value for later flow is it possible? How can I do that?

Anonymous · ‎2018-11-06

You will need to learn and use Java to become an expert with Talend. While Talend removes a lot of the coding, you still need the understanding to debug effectively and to do some really cool things with it. Java opens A LOT of doors with Talend.

In response to extracting the values using Talend components without Java, I don't think you can. You can convert the List to a String using a tConvertType and you can probably extract the values using the tNormalize, but when it comes to removing excess characters there is no real alternative to using Java. Doing all of this with components can cause a job to get quite big very quickly when all you need is something like below....

java.util.Iterator<String> it = strList.iterator(); 

while(it.hasNext()){ 
    String linkString = it.next(); 
    //Find only the Next link 
    if(linkString.indexOf("rel=\"next\"")>-1){ 
         //Strip the opening < char 
         linkString = linkString.substring(linkString.indexOf('<')+1); 
         //Strip the characters at the end  
         linkString = linkString.substring(0, linkString.lastIndexOf('>')); 
         row1.link = linkString; 
     }  
}

Parikhharshal · ‎2018-11-06

@rhall: Couple of things: I tried below code:

java.util.List<String> strList = ((java.util.Map<String,java.util.List<String>>)globalMap.get("tRESTClient_2_HEADERS")).get("Link");

//System.out.println(strList);

//Create an empty "link" variable

String link = null;

java.util.Iterator<String> it = strList.iterator();

while(it.hasNext()){

String linkString = it.next();

//Find only the Next link

if(linkString.indexOf("rel=\"next\"")>-1){

//Strip the opening < char

linkString = linkString.substring(linkString.indexOf('<')+1);

//Strip the characters at the end

linkString = linkString.substring(0, linkString.lastIndexOf('>'));

link = linkString;

}

System.out.println(linkString);

}

And it printed entire Link header as it is. I think here it should return only next URL. Is it correct?

Apart from this, I thought it is worth trying use tConvertType and see how it can work to gain some exposure and I built job like this:

For some reason it does not return anything. Am I doing anything wrong here?

This is what is mentioned in tConvertType schema.

And this is what is mentioned in tConvertType.

Not sure what is missing though.

Anonymous · ‎2018-11-06

Your System.out needs to be inside the IF condition and not outside of it. Otherwise it prints every link in the List without it having been processed.

Parikhharshal · ‎2018-11-06

@rhall: I did that and it is cutting the text in print:

https://abc.test.instructure.com/api/v1/courses/160/analytics/student_summaries?page=1&per_page=10>; rel="current",<https://abc.test.instructure.com/api/v1/courses/160/analytics/student_summaries?page=2&per_page=10>; rel="next",<https://abc.test.instructure.com/api/v1/courses/160/analytics/student_summaries?page=1&per_page=10>; rel="first",<https://abc.test.instructure.com/api/v1/courses/160/analytics/student_summaries?page=14&per_page=10

I think it is cutting >;rel="last" and also starting "<" before first url for rel=current

if(linkString.indexOf("rel=\"next\"")>-1){

//Strip the opening < char

linkString = linkString.substring(linkString.indexOf('<')+1);

//Strip the characters at the end

linkString = linkString.substring(0, linkString.lastIndexOf('>'));

link = linkString;

System.out.println(linkString);

}

Anonymous · ‎2018-11-06

You are looking for the next url not the current one. The current one is the url that has been used to get the header. The url that is returned is the next url that needs to be used.

Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Java

JSON

REST

Talend Data Integration

v7.x