Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Gourav_King_of_DataLand
Contributor II
Contributor II

Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi All,

Need you expert help.

 

The requirement is to pull all chat data from REST API (one time full data dump) and then pull chat on daily basis.The output is spread across 180K pages with each page giving URL to next and previous (except first page which have only 'nex_url' and last page with have only 'prev_url').

 

So Far I have been able to use the API/URL to extract information from first page first page

 

tRestClient ->tJavaRow->tJsonExtract->tOracleOut

 

How do I modify the job to

1) Pull all data for one time data dump, 180k pages

2) Pull data on daily basis for current day or extract data until the timestamp is current day.

 

Example output from API

 

Page 1 gives
{
    "chats": [

                  all chat related attributes that needs to imported

                 ],
    "count": 179451,
    "next_url": "next_url_here"

}


Page2 gives

{
    "chats": [

                all chat related attributes that needs to imported

                 ],
    "count": 179451
    "prev_url": "previous_url_here"
    "next_url": "next_url_here"

}

Page 3 gives ......next page 

 

 

Labels (5)
79 Replies
Parikhharshal
Creator III
Creator III

@rhall: What I have been told is it should be in the HTTP response header.
It is dynamically generated by the API service so its not stored any where.

Anonymous
Not applicable

If your max pages is kept in the response header, you can use this post I put together a while ago to help you out. The code in the post should just work (in printing out the header information). You will need to manipulate it to get hold of the max pages.

 

https://community.talend.com/t5/Design-and-Development/Logging-Integration-with-Sentry-io/m-p/134832...

Parikhharshal
Creator III
Creator III

@rhall: No I do not have max value or anything stored. But this is how I am getting the link header value.

 

here's the header info:
Cache-Control → max-age=0, private, must-revalidate
Content-Encoding → gzip
Content-Type → application/json; charset=utf-8
Date → Sun, 04 Nov 2018 22:57:52 GMT
ETag → W/"050c35c9704f9227825ff7519f24db08-gzip"
Server → Apache
Set-Cookie → _csrf_token=knez8jQvjYeteSmHjM1uPKB1D2QqCnpda51yJRIlaqjYPdyaRGq7w8g7Bu7EjAZd%2BQdjC15uVRQ8yUFocVAY5Q%3D%3D; path=/; secure
Status → 200 OK
Strict-Transport-Security → max-age=31536000
Vary → Accept-Encoding
X-Canvas-Meta → q=794;at=111460000000014213;dk=170000000000016;a=1;g=iyowPqCgRyryNo8u1XHRx48aP8ejprE2TU532rjO;s=11146;c=cluster41;z=ap-southeast-2a;o=analytics_api;n=course_student_summaries;x=5.0;p=f;f=2018-11-04T22:57:52.92Z;b=1340904;m=1340904;u=0.80;y=0.02;d=0.93;
X-Canvas-User-Id → 111460000000016069
X-Content-Type-Options → nosniff
X-Frame-Options → SAMEORIGIN
X-Rate-Limit-Remaining → 700.0
X-Request-Context-Id → 09df381e-8500-4d64-9a20-9368e63e91d2
X-Request-Cost → 1.7318528289999544
X-Request-Processor → 0f23761dee3045337
X-Runtime → 1.974539
X-Session-Id → f4396b54b90c37518fc26552ef646ed2
X-UA-Compatible → IE=Edge,chrome=1
X-XSS-Protection → 1; mode=block
Content-Length → 329
Connection → keep-alive
 
I actually designed the job like to find the next page and last page value like below:
 
0683p000009M0rD.png
 
In tloop it is set like this:
 
0683p000009M0rI.png
 
In trest_client I have set URL like this: 
 
TJavaFlex has got code like below:
 

if  (((Integer)globalMap.get("i"))==null)

    {

    globalMap.put("has_more", false);

 

The job runs in iteration but gets into infinite loop. However, when I print value in tJava for i it keeps showing me value incrementing which is correct but result is shown only for first page 1 even though when i becomes 2 or 3 or 4......

 

Seems my if condition is wrong or something else. Is it something you can advise me on or is there any way I can use any Link Header values to create a loop?

 

HI @gr44: Your advise is really appreciated as well this in case. Thanks.

Anonymous
Not applicable

This is not stopping because "has_more" is never set to false. This is happening because "i" is never null. So, you need to solve why "i" is not set to null. Where do you set "i" and how do you set it?

Parikhharshal
Creator III
Creator III

@rhall: What I found is has_more is just useless and not doing anything. I think I am going to get rid but not sure how to stop i. Can you please give hint?

Parikhharshal
Creator III
Creator III

@rhall: Also Can you please provide screenshots for what you set in globalmap, tflex and tloop in your example?

 

0683p000009M0r9.png

Anonymous
Not applicable

I suspect that "i" is never null. You need to look at that I believe

Parikhharshal
Creator III
Creator III

@rhall: Is there any other way I can find what's going to be max/last page in url?

Anonymous
Not applicable

Yes, there is. I gave you a link about 8 posts above this one. You will need to use Java.