Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Extract more than 10k records from thttpsrequest component

I am using Talend big data 6.4 and I have got a scenario which requires your guys expertise.
Here is the scenario:
I am using thhtprequest component (GET method) to extract the data which is hosted on kinvey server. Due to some restrictions at source; if any tables having more than 10k records, only first 10k records are extracted and the remaining records are discarded and not sent through that request.
Here, I require your expertise to help me to extract all records by some work around.

While doing some investigation, I came to know a concept called pagination can be used to solve the problem. But I don't know how to configure this pagination in Talend or to use any other components for this purpose.

It would be more beneficial if you can share some ideas on how to get this working and also show us the screenshot about list of components used for that job.

Any other ways to accomplish this work around is also greatly welcome. (I heard another way is to use tloop component). kindly share the screenshot of the components used along with any java codes written for this purpose.
Labels (5)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

The code uses the number of iterations of the loop to calculate the record numbers....

//Set the limit value
int limit = 1000;
//Set the skip value....(1000 x the current iteration of the loop) - 1000
int skip = (1000 * ((Integer)globalMap.get("tLoop_1_CURRENT_ITERATION")).intValue()) -1000;

//Set the query value
String query = "?query={}&limit=" +limit+"&skip="+skip;

//Assign the query value to the query globalMap variable
globalMap.put("query", query);

If we assume that the first iteration is iteration 1, then the query string will be....

"?query={}&limit=1000&skip=0"

The second iteration it will be....

"?query={}&limit=1000&skip=1000"

 

The third iteration it will be....

"?query={}&limit=1000&skip=2000"

View solution in original post

19 Replies
Anonymous
Not applicable
Author

OK, this depends on how the pagination is enabled in your service, but I have written a tutorial for retrieving Spotify listening history that makes use of a type of this functionaliy. You may be able to extrapolate from this in order to solve your problem. The tutorial is here https://www.rilhia.com/tutorials/using-talend-get-your-spotify-listening-history-facebook

You will need to search for the "The GetMySpotifyListeningHistory Job" and look at steps 4,5,6,7 and 8. It's not the easiest of things to get conceptually, but hopefully you can extrapolate from that.

Essentially the process is....
1) Connect your HttpRequest to a tLoop
2) Run your HttpRequest using a globalMap variable holding the URL.
3) Retrieve the data, process it (or store it) and retrieve the new URL (for the next batch of data). Store it in the globalMap

4) Perform logic to enable the Loop to run again

......and so on.

Hope that helps 🙂

Anonymous
Not applicable
Author

Yours is the one of the first place I went and try to replicate from my side and tailored it for my needs

I did exactly what you mentioned, but unfortunately nothing happens. Can you check why it is not working? Please note the endpoint value i gave is dummy, but the actual one i give is correct and it is working independently while using through thttprequest component. Below is the screenshot of the same.

0683p000009Lvq9.jpgendpoint has a valid value, used here is dummy0683p000009Lvb2.jpgtBigqueryoutput have all valid values and can connect to BigQuery with no problem

Anonymous
Not applicable
Author

Hello 

I am working a task that has the similar problem, the rest API only returns limited number of records each time, however,  the API provides external parameters limit N offset N to read all records by calling the API multiple times. I am using a tLoop to do a loop in the job. see 

The URL looks like:

"https://......?q=SELECT * from messages limit 1000 offset "+((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))"

0683p000009LvqE.png0683p000009LvVb.png

Hope it helps you.

 

Regards

Shong

Anonymous
Not applicable
Author

Great shong.
You are the man. Can you send me that job; so that I can tailor that job to
my needs. Kindly share.
Anonymous
Not applicable
Author

I export the job items from v6.4.0, you should use the same version or higher version to import the job to your studio. 


RestAPIwithLimitedRecords.zip
Anonymous
Not applicable
Author

Hi Shong,
Thankyou very much for the help.
However, I see you are getting the count using a query in URI, but in my
case there is a column in my JSON file as count containing the total no.of
records for that load and all rows will have same value for that particular
JSON load.
So if you run my job then the count variable is set to 20000 (say you have
20k records at input) for 20000 times. So will it be a problem if such
assignment is happening 20k times? Can it be changed to load this variable
only once instead? Because I am getting 20000 times that the count value is
set to 20000 while running.
Also I see the iteration is set to 3. Does this mean that this block will
run only 3 times Max? If so will it load only 3000 records Max for your
requirement as you set the limit to 1000.
Please clarify and give advice on the above queries I have.

Anonymous
Not applicable
Author

Can you show us the JSON response you get. My way will only work if the JSON returns a new URL for the next set of records (which is implemented in the Facebook/Spotify domain), @shong's solution will only work if you are asked to set a numeric parameter at the end of the URL.

In order to help you we will need to see what we are working with 🙂

Anonymous
Not applicable
Author

Hi Rhall_2_0,

What you say is the case, if i want to extract 1st 10k records i should append this code "?query={}&limit=10000&skip=0" to existing URI to get 1st 10k records and to extract 2nd 10k records, the uri should have to be changed to URI + "?query={}&limit=10000&skip=10000" and so on. May I know how exactly do we work to get all the records by using a loop with this varying URI?

Anonymous
Not applicable
Author

OK, you can keep most of your job structure (I would used @shong's example for this). What you have to remember is that the last part of your URL will adjust with every call. The code you will need to change is below.....

 

//Set the limit value
int limit = 1000;
//Set the skip value....(1000 x the current iteration of the loop) - 1000
int skip = (1000 * ((Integer)globalMap.get("tLoop_1_CURRENT_ITERATION")).intValue()) -1000;

//Set the query value
String query = "?query={}&limit=" +limit+"&skip="+skip;

//Assign the query value to the query globalMap variable
globalMap.put("query", query);

You will then need to append ((String)globalMap.get("query")) to your URL.