Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
ManjunathBhat
Contributor
Contributor

Talend component to loop through API to download multiple files

Hi Team,

new to Talend and we have a contract expiring. Need help.

Issue: We have an application who hosts our files

They have provided us with API credentials using OAuth 2.0

one API URL gives a list of all available documents.

https://api.icims.com/customers/{customerId}/forms/list

The other one gives option to download the file as PDF but have to pass formname and formdataID

https://api.icims.com/customers/{customerId}/forms/{formname}/data/{formDataId}.pdf

I am guessing I have to first read through first URL to get all names and ID and then dynamically loop and pass them to second URL to download files?

Can anyone help me what components to use, authenticate and download forms?

13 Replies
Anonymous
Not applicable

Hi

First of all, try the first URL using tRest or tHttpRequest component to see if you are able to call the API successfully, set OAuth 2.0 bearer token as one of header.

Second, if step1 works, check what data does the API contains in the response? use a tExtractXMLFields to extract the required data from the response string.

Third, iterate the extracted data and call the second URL. The job looks like:

tHttpRequest --main--tExtractXMLFields--main--tFlowToIterate--iterate--tHttpRequest

 

Regards

Shong

 

ManjunathBhat
Contributor
Contributor
Author

Hello,

 

Thank you for your response. I followed your steps, please forgive my mistakes as I am very new to this and to API as well. I setup as in below screenshot. The vendor has provided me just these details client ID and client secret. When I run the job I am getting second screenshot error

Sorry I think I might be doing something wrong while setting the Outh 2.0 Bearer token?

 

0695b00000deSeKAAU.png 

0695b00000deSePAAU.png

Anonymous
Not applicable

Using a tLogRow to print the reponse the console. The job looks like

tHttpRequest--main--tLogRow

 

If you fill in the header, you have to set the key and value.

I think you need to check the 'Need authencation' box and fill in the client id and client secret.

 

Regards

Shong

ManjunathBhat
Contributor
Contributor
Author

Hi Shong,

 

Thank you.

I was able to get the credentials from vendor and use basic authentication. I am also now able to connect to API.

 

There is an API "https://api.icims.com/customers/{customerId}/forms/{formDataID}.pdf"

 

The minimum value for formdataID is 11 and the maximum value around 600000

 

I used tFileFetch for sample number and I was able to download the pdf file.

 

Now problem is I have to iterate from 11 to 600000 and for each run pass the incremented value dynamically so that job picks up the next value pass that as parameters to the URI and fetch next file. how can i do this? sorry again new to talend

 

like in below example for each run the number 2289 need to increment to next value

 

0695b00000dem0KAAQ.png 

 

Anonymous
Not applicable

Hi

 

You can use tLoop component to iterate each formdataID, eg:

tLoop--iterate--tFileFetch.

on tLoop: select 'for' type, start 11 to 600000.

on tFileFetch: set a dynamic URI:

"https://server/xxxx/forms/"+((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))+"pdf"

Hope it helps!

 

Regards

Shong

ManjunathBhat
Contributor
Contributor
Author

Hi Shong,

 

Thank you so much, this really helps. I got started and now able to pass the value from tloop to tfilefetch.

 

One issue I ran into is from number 11 to 600000 in between there are few numbers where the form does not exist

 

ex: 11.pdf works, and all way from 12 to 48 it does not exists and again starts from 49 goes till 120 and stops and starts back again at 170

 

Everytime the tfilefetch does not find a file it fails.

 

below is the error response

 

{

    "errors": [

        {

            "errorMessage": "A form with the id of 12 does not exist.",

            "errorCode": 2

        }

    ]

}

 

How can I catch the response from API check if the file exists(response is ok) then go to filefetch if not go back to loop increment and check again ex below

 

tLoop-iterate-checkAPIforError-NoErrorgotoTfilefetch

tloop-iterate-checkAPIforerror-iferrorgobacktoloop-tloop-iterate-checkAPIforError-NoErrorgotoTfilefetch

Anonymous
Not applicable

About checkAPIforError, you need to check with vendor whether there exist an API available for checking the file if it exists or not. If no, need to skip this step, create the job as below:

main job:

tLoop--iterate--tRunJob

 

on tRunJob, call the child job, pass the current iteration value to child job using context variable, please see this documentation page about how to do it, uncheck the 'die on error' box.

 

child job:

tFileFetch

 

on tFileFetch: set a dynamic URI:

"https://server/xxxx/forms/"+context.varName+"pdf"

 

//context.varName is a context variable used to receive the current iteration value from main job.

 

if you need to catch the error message, use a tLogCatcher component in child job, and output the error to a log file.

tFileFetch

tLogCatcher --main--tFileOutputDelimited

 

on tFileOutputDelimited: check the 'append' box to append the error message to the same file.

 

 

Regards

Shong

 

 

ManjunathBhat
Contributor
Contributor
Author

Hi Shong,

 

You rock, thank you so much, one last question.

 

Everything now works as expected and even able to catch error and append to log.

One modification here is it throws standard "java.lang.Exception:Method failed: HTTP/1.1 404 ;1"

 

Instead of this can I write the contextvariable value so that I can add to file which filenumber does not exist?

 

Like example when child job runs for 12 it fails and logs error and along with this how can I append the current context.varName value to erroroutput file?

 

Many thanks again

Anonymous
Not applicable

tLogCatcher has a predefined schema, however, you can add extra columns on tMap, eg:

tLogCatcher--main--tMap--tFileOutputDelimited

 

on tMap, add a new output, drag and drop all the columns from input data flow, add a new column called "invalidFormDataId" for example, set its value expression as: context.varName

 

this will output the invalid formdataID to the log file.