Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

[resolved] Calling paginated REST API

Hi,
I have 2 questions - 
1) First can we directly use the REST response (XML/JSON data) and parse its contents?
2) I have to call a REST api which has paginated response. So would need to call till all the data is exhausted. How can we achieve the same.
-Manoj
Labels (5)
1 Solution

Accepted Solutions
Anonymous
Not applicable

1) Yes you can. This is very simple with Talend.....simpler with XML than JSON though.
2) Paginated responses can be a bit fiddly, but entirely possible. I have actually written a tutorial about how to get Spotify listening stats from Facebook, that has to deal with a REST service making use of paginated results. My tutorial can be found here. It is probably more complicated than your job will need to be, but should give you a good idea of how to deal with getting all of the REST response pages. You can download the job at the bottom of the tutorial and if you have Facebook with a Spotify account linked, can make use of it.
Hope it helps

View solution in original post

9 Replies
Anonymous
Not applicable

1) Yes you can. This is very simple with Talend.....simpler with XML than JSON though.
2) Paginated responses can be a bit fiddly, but entirely possible. I have actually written a tutorial about how to get Spotify listening stats from Facebook, that has to deal with a REST service making use of paginated results. My tutorial can be found here. It is probably more complicated than your job will need to be, but should give you a good idea of how to deal with getting all of the REST response pages. You can download the job at the bottom of the tutorial and if you have Facebook with a Spotify account linked, can make use of it.
Hope it helps
_AnonymousUser
Specialist III
Specialist III
Author

Thanks a lot. This helped.
arpiitv
Contributor
Contributor

Page is not open.
Anonymous
Not applicable

Is the tutorial still online somewhere? 

Anonymous
Not applicable

@rhall : Your tutorial is not available. Could you please share the updated link. This might help me in fixing my Pagination issue for Rally

Serendipity
Contributor II
Contributor II

@Richard Hall​ 

Thought I'd ask as well, looking for how to work though a paginated API call. I've got a job that returns the count of total records, then divides that out to indicate how many total pages there will be. Looking to figure out how to use that number to iterate.

Anonymous
Not applicable

Sorry about the delay in getting back to (and to everyone else whose posts I missed). This tutorial was on my personal website a few years ago. Unfortunately it was continually attacked by individuals and I didn't have the time to keep it updated and safe. So I took it down. However, I have managed to find a copy of the tutorial that originally posted. Most of it (the Spotify data part) is out of date, but I shall include the section on pagination below. It has some old screenshots, but it is still pretty valid......

 

*******************************************************************************************

 

The GetMySpotifyListeningHistory Job

The second child job we will look at is the "GetMySpotifyListeningHistory" job. This will be used to get the music listening data from Facebook and output it to an Excel spreadsheet. Getting the data from Facebook requires the "endpoint" that was generated by the "CheckAuthorisation" Job.

 

A screenshot of this job can be seen below.....

0695b00000Htr0gAAB.png 

This Job uses one Context variable of type String called "endpoint". This is passed to the Job by its parent Job. This will be shown when we look at the relationship between the parent and child Jobs.

 

1) "Endpoint to Row" (tFixedFlowInput)

This component is used to create a row from the "endpoint" Context variable. Before this component can be used a schema needs to be set up for it. To do this, click on the "Edit schema" button in the screenshot below (circled in red). You will see a popup window appear. Click on the green plus button (circled in red), name the column and give it an appropriate type. In this example we are calling it "endpoint" and giving it a type of String. Once this is complete, click on "OK".

 

0695b00000Htr0wAAB.png 

Once the schema is created, ensure that the "Number of rows" is set to 1 and add "content.endpoint" to the "Value" field of the "endpoint" column (as above). 

The settings above will ensure that this component produces 1 row with 1 column that holds the value of the Context variable "endpoint".

 

2) "Set initial 'endpoint' value" (tSetGlobalVar)

This component is used to set a global variable called "endpoint". The purpose of this is so that it can be used by a loop. Each iteration of the loop will check the value of the global variable. If it is ever null, the loop will stop. This is explained in the next section. The configuration of this component can be seen below....

0695b00000Htr1jAAB.png 

The value of "row1.endpoint" corresponds to the row connecting this component with the previous component.

 

3) "Loop through service calls" (tLoop)

This component drives the processing in the Job. The configuration can be seen below......

0695b00000Htr1oAAB.png 

In this Job we are using a "While" loop that is looking at the value held by the "endpoint" global variable. The "Condition" field above is populated by the following code....

 

((String)globalMap.get("endpoint"))!=null

 

This code gets the tLoop component to continue looping while the value of the "endpoint" global variable is not null. The other fields ("Declaration" and "Iteration") are populated with arbitrary values as they are not required for this condition.

The reason the "endpoint" global variable is tested like this is that when the URL held by the "endpoint" variable is used, it will return a JSON string of data. This is limited to 50 records. If there are more records to be retrieved, then the JSON string also supplies another URL for the next 50 records. This is set to be the value held by "endpoint". This is done until there are no records left. When this is the case, no URL is returned and the "endpoint" global variable will hold nothing (null). At this point the tLoop component knows there is no need to continue.

 

4) "Service call" (tHttpRequest)

This component calls the URL held by the "endpoint" variable. The configuration can be seen below....

0695b00000Htr2cAAB.png 

5) "Set 'endpoint' to null" (tJavaRow)

This component is used to pass the result of the service call on to the next component and set the "endpoint" global variable to null. This is done so that the "endpoint" variable is set to stop the tLoop if no new URL value is passed back. To configure this component, first we need to configure the schema. To do that, click on the "Edit schema" button (circled in red) and a popup window will appear, as shown below. Then click on the double right arrow button (circled in red) to copy the schema of the tHttpRequest component. Once this is done, click the "OK" button.

0695b00000Htr3GAAR.pngOnce the schema has been sorted, we need to configure the component to pass the response from the tHttpRequest component to the next component and reset the "endpoint" variable. The code to do that is below...

 

output_row.ResponseContent = input_row.ResponseContent;

globalMap.put("endpoint",null);

 

The first line deals with passing the "ResponseContent" column to the output.

The second line deals with setting the "endpoint" variable to null.

 

*******************************************************************************************

 

This should help you out with the pagination issue. As I said, sorry about the delay. It took a while for me to dig this out 🙂

abubakarseo22
Contributor
Contributor

Sure, here are the answers to your questions:

  1. Parsing REST Response (XML/JSON): Yes, you can directly use the REST response in XML or JSON format and parse its contents. Most programming languages and data tools provide libraries and functions to parse XML and JSON data. For example, in Python, you can use xml.etree.ElementTree for XML and json module for JSON. In Qlik Sense, you can use REST connectors which can automatically parse JSON and XML responses.

  2. Handling Paginated REST API Responses: To handle paginated responses, you will need to make repeated API calls until all the data is retrieved. This typically involves:

    • Making the initial API call.
    • Checking the response for pagination information (like a next page URL or a token for the next set of data).
    • Making subsequent API calls using the pagination information until no more data is available.

In most cases, the API documentation will specify how pagination works and how to retrieve subsequent pages of data. If you're using a tool like Qlik Sense, it often provides features to handle pagination automatically by configuring the REST connector with pagination settings.

If you need specific code examples or more detailed guidance, feel free to ask!

abubakarseo22
Contributor
Contributor

To handle pagination in Talend, you can set up a loop that retrieves data from your REST API until all records are fetched. Start by initializing a context variable with your initial URL. Use tREST to fetch data, tExtractJSONFields to parse it, and tJavaRow to evaluate if there's a "next" URL in the pagination section. Loop this process until all records are retrieved, updating the URL each iteration to fetch the next set of data.

This approach ensures you systematically retrieve and process all records returned by your REST API's paginated responses.