Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
Need you expert help.
The requirement is to pull all chat data from REST API (one time full data dump) and then pull chat on daily basis.The output is spread across 180K pages with each page giving URL to next and previous (except first page which have only 'nex_url' and last page with have only 'prev_url').
So Far I have been able to use the API/URL to extract information from first page first page
tRestClient ->tJavaRow->tJsonExtract->tOracleOut
How do I modify the job to
1) Pull all data for one time data dump, 180k pages
2) Pull data on daily basis for current day or extract data until the timestamp is current day.
Example output from API
Page 1 gives
{
"chats": [
all chat related attributes that needs to imported
],
"count": 179451,
"next_url": "next_url_here"
}
Page2 gives
{
"chats": [
all chat related attributes that needs to imported
],
"count": 179451
"prev_url": "previous_url_here"
"next_url": "next_url_here"
}
Page 3 gives ......next page
You need to have this sort of layout...
You set the initial globalMap in the "Set initial globalMap" tJava. Then set the "Where" clause logic in the tLoop component. The "Dummy" component is just to allow you to link to the tRestClient. I've included the "Modify JSON" tJavaFlex following on from your last question. Then you can set the next url in the "Set globalMap" tJavaFlex.
My assumption is that the "next_url" element will not be supplied if there are no other pages after that one. If that is the case, you can do it like this.....
1) Set up a tLoop using the "while" loop functionality. Use a globalMap variable holding your initial URL (set in a tJava preceding the tLoop) as your test on your while clause. "While globalMap value is not null" for example.
2) Use the globalMap value in your tRestClient
3) Retrieve your data for each service call and also retrieve the next_url. Set the globalMap value to be that of the next_url. If it is not present, then this will be null.
The tLoop will fire for each url supplied and will stop when the next_url value is not supplied.
I am trying to do what you suggested, I am quite new to Talend hence sometimes it's bit difficult to achieve small and simple things as well.
I used tsetglobal to set the initial URL and then passed it to tRestClient. Then extracting the 'next_url' from tExtractJson and till here things are good. I looked up result in tLogRow and can see the next_url. However I am not able to ficure out how to assign the next_url from tExtractJson to a global variable in tJava.
You need to have this sort of layout...
You set the initial globalMap in the "Set initial globalMap" tJava. Then set the "Where" clause logic in the tLoop component. The "Dummy" component is just to allow you to link to the tRestClient. I've included the "Modify JSON" tJavaFlex following on from your last question. Then you can set the next url in the "Set globalMap" tJavaFlex.
LInk the tExtractJson component to a tJavaFlex not a tJava. The code for the tJavaFlex should be in the Main Code section.
Ah yes, sorry I have been away from my machine all day. The tJavaRow uses the input_row and output_row row names ( for some strange reason) the tJavaFlex uses the actual row names.
Did this work for you?
Yes this did work for me, the final solution looks like this.
Global Variable
"V_API_URL" - Holds initial URL and then is updated with next URL that I get from tExtractJson.
"V_LOOP" - Defined Boolean, hold value true to start with and is set to false when the V_API_URL is null
TLoop
Used While loop, without Declaration and Iteration
tRestClient
tJavaFlex
For each loop Global Variable V_API_URL is set to the next_url got from tJsonExtract
Check for if the V_API_URL has any url or not, if it does the loop continues as the value for the V_LOOP is still true and if there is no url then V_LOOP is set to False. This would exit the loop when it's end page.