Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Gourav_King_of_DataLand
Contributor II
Contributor II

Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi All,

Need you expert help.

 

The requirement is to pull all chat data from REST API (one time full data dump) and then pull chat on daily basis.The output is spread across 180K pages with each page giving URL to next and previous (except first page which have only 'nex_url' and last page with have only 'prev_url').

 

So Far I have been able to use the API/URL to extract information from first page first page

 

tRestClient ->tJavaRow->tJsonExtract->tOracleOut

 

How do I modify the job to

1) Pull all data for one time data dump, 180k pages

2) Pull data on daily basis for current day or extract data until the timestamp is current day.

 

Example output from API

 

Page 1 gives
{
    "chats": [

                  all chat related attributes that needs to imported

                 ],
    "count": 179451,
    "next_url": "next_url_here"

}


Page2 gives

{
    "chats": [

                all chat related attributes that needs to imported

                 ],
    "count": 179451
    "prev_url": "previous_url_here"
    "next_url": "next_url_here"

}

Page 3 gives ......next page 

 

 

Labels (5)
1 Solution

Accepted Solutions
Anonymous
Not applicable

You need to have this sort of layout...

0683p000009LzQj.pngYou set the initial globalMap in the "Set initial globalMap" tJava. Then set the "Where" clause logic in the tLoop component. The "Dummy" component is just to allow you to link to the tRestClient. I've included the "Modify JSON" tJavaFlex following on from your last question. Then you can set the next url in the "Set globalMap" tJavaFlex. 

View solution in original post

79 Replies
Anonymous
Not applicable

My assumption is that the "next_url" element will not be supplied if there are no other pages after that one. If that is the case, you can do it like this.....

 

1) Set up a tLoop using the "while" loop functionality. Use a globalMap variable holding your initial URL (set in a tJava preceding the tLoop) as your test on your while clause. "While globalMap value is not null" for example.

2) Use the globalMap value in your tRestClient

3) Retrieve your data for each service call and also retrieve the next_url. Set the globalMap value to be that of the next_url. If it is not present, then this will be null.

 

The tLoop will fire for each url supplied and will stop when the next_url value is not supplied.

Gourav_King_of_DataLand
Contributor II
Contributor II
Author

I am trying to do what you suggested, I am quite new to Talend hence sometimes it's bit difficult to achieve small and simple things as well.

 

I used tsetglobal to set the initial URL and then passed it to tRestClient. Then extracting the 'next_url' from tExtractJson and till here things are good. I looked up result in tLogRow and can see the next_url. However I am not able to ficure out how to assign the next_url from tExtractJson to a global variable in tJava. 

 

0683p000009Lzcb.png

 

 

 

0683p000009Lzcg.png

 

 

 

 

 

 



0683p000009LzNw.png

 

 

0683p000009LzWZ.png0683p000009LzIv.png

Anonymous
Not applicable

You need to have this sort of layout...

0683p000009LzQj.pngYou set the initial globalMap in the "Set initial globalMap" tJava. Then set the "Where" clause logic in the tLoop component. The "Dummy" component is just to allow you to link to the tRestClient. I've included the "Modify JSON" tJavaFlex following on from your last question. Then you can set the next url in the "Set globalMap" tJavaFlex. 

Gourav_King_of_DataLand
Contributor II
Contributor II
Author

Hi rhall_2_0, thank you for replying.

I am unable to get the attribute value from tExtractJson to tJava. In this case I am extracting "next_url" in tExtractJson but when I link it to tJava I have no clue how to assign globalMap variable. Below is the code in the tJava, this is giving error 'input_row cannot be resolved'

globalMap.put("next_url",input_row.next_url);
System.out.println("Value Of GlobalVar: "+globalMap.get("next_url"));
Anonymous
Not applicable

LInk the tExtractJson component to a tJavaFlex not a tJava. The code for the tJavaFlex should be in the Main Code section. 

Gourav_King_of_DataLand
Contributor II
Contributor II
Author

thank you for replying. The below mentioned code in tJavaFlex is giving error

globalMap.put("v_next_url",input_row.next_url);

'input_row cannot be resolved'

tExtractJson to tJavaFlex (what code I have to write in tJAvaFlex to get next_url value from tExtractJson to a global variable in tJavaFlex. The one I somehow manage to find from internet doesn't work 😞 )
Gourav_King_of_DataLand
Contributor II
Contributor II
Author

resolved it, had to use
globalMap.put("V_API_URL",row2.next_url);
instead of
globalMap.put("v_next_url",input_row.next_url);

row2 is the output from tJsonExtract
Anonymous
Not applicable

Ah yes, sorry I have been away from my machine all day. The tJavaRow uses the input_row and output_row row names ( for some strange reason) the tJavaFlex uses the actual row names.

 

Did this work for you?

 

Gourav_King_of_DataLand
Contributor II
Contributor II
Author

Yes this did work for me, the final solution looks like this.

0683p000009LzM5.png

 

 

 

 

 

 

Global Variable 

"V_API_URL" - Holds initial URL and then is updated with next URL that I get from tExtractJson.

"V_LOOP" - Defined Boolean, hold value true to start with and is set to false when the  V_API_URL is null 

 

TLoop

Used While loop, without Declaration and Iteration
0683p000009LxsN.png

 

 

 

 

 

 

tRestClient

0683p000009LzeX.png

 

 

 

 

 

 

 

 

tJavaFlex

 

For each loop Global Variable V_API_URL is set to the next_url got from tJsonExtract

Check for if the V_API_URL has any url or not, if it does the loop continues as the value for the V_LOOP is still true and if there is no url then V_LOOP is set to False. This would exit the loop when it's end page.

0683p000009Lzec.png