Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!

Using file triggers in Talend Cloud

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
TalendSolutionExpert
Contributor II
Contributor II

Using file triggers in Talend Cloud

Last Update:

Aug 14, 2024 1:57:23 AM

Updated By:

Shicong_Hong

Created date:

Oct 18, 2023 7:28:20 AM

Attachments

You may use file triggers to execute Talend Jobs in the Talend Administration Center Job Conductor; however, you won’t find a similar construct in Talend Management Console. Currently, Talend Management Console does not offer an explicit method of triggering task execution when there is a change in the status of a file in a filesystem.

This article shows you two useful workarounds: Job triggering, and task triggering using a Talend Route. Jobs have a start and an end, and Routes run until stopped.

These workarounds provide a proof of concept (POC), not drop-in solutions. You can modify these POCs to meet your use case requirements and create a fully deployable solution. Links to additional workarounds for AWS, Azure, and Google Cloud Platform are provided in the References section of this article.

The FileTriggers.zip file, attached to this article, contains the Talend Studio 7.3 proof of concept Jobs discussed in this article. Although the Jobs included in this file require Talend Studio 7.3, you can build the same Jobs in earlier versions.

Job trigger

The Job trigger method relies on the Studio tWaitForFile component to detect an external file action in a filesystem. The filesystem must be accessible from a Remote Engine because the Job cannot run using a Cloud Engine. The Job is designed to run continuously in Talend Management Console, and it provides user-level log information to Talend Management Console and detailed log information in the Job log found on the Remote Engine.

The Job waits for any action (create, update, or delete) on a file that matches the file mask in the tWaitForFile component, then it formats the full path of the file and passes it to a Talend Management Console task that is triggered by the file action.

0693p000008uqERAAY.png

Using context variables

The Job uses the following context variables, which are implicitly loaded from a delimited file into a tContextLoad component.

0693p000008uqLvAAI.png

 

  • parameter_interval defines the time in seconds between iterations of the tWaitForFile component.

  • parameter_directory is the directory for the tWaitForFile component to scan.

  • parameter_mask is the file mask used to match files in the directory used by tWaitForFile.

  • parameter_task_id is the Task ID of the Job to be triggered and is used by the tFixedFlowInput. You can find the Task ID on the Task Details page in the Talend Management Console:

    0693p000008uq48AAA.png

     

  • parameter_auth_id is a Base64-encoded string of the user’s Talend Cloud login ID and password in the format <uid>:<password>. The parameter is used on the Advanced settings tab of the tRestClient component.

 

Processing flow

In this case, the tWaitForFile component is configured to fire on any file action (including creation, update, or delete) for any file that matches the file mask in the scanned directory. After the Job starts, the loop continues indefinitely, until the Job is terminated.

0693p000008uqM0AAI.png

The tWaitForFile component transfers all information known about the file and the action that caused the firing to a tMap component, which formats the full path of the triggering file for later use.

0693p000008uqM5AAI.png

If multiple files cause the trigger to fire, a tFlowToIterate component creates a loop. For each iteration of the loop, data about the file is logged to stdout by a tJava component. In Talend Cloud, this information can be found in the Job log of the Remote Engine for the instantiation of the Job.

Next, higher-level information about the start of processing for the triggered file is posted to the task's user log in Talend Management Console.

When core processing starts, the ID of the task to be executed and the path of the triggering file is picked up by a tFixedFlowInput component and passed to a tMap component, which formats them to the Talend Cloud Swagger API specification for the executions API method.

{
  "executable": "57f64991e4b0b689a64feed0",
  "parameters": {
      "parameter_filepath": row2.filepath
  }
}

In this method, the called Job has one parameter: the path of the triggering file. The formatted API message is passed to a tRESTClient component and then to the Talend Cloud Swagger API. The authorization token is passed in the HTTP header for the API call.

API return information is logged to one of two tLogRow components (Response and Error). Final information is posted to the Talend Management Console task log.

Triggered task

In this batch file method, the triggered task is based on a simple Talend Job called tdye_file_triggered_job.

0693p000008upxMAAQ.png

The Job picks up a context parameter for the file path as passed from the triggering task, formats it (tJavaRow instead of tMap), and writes a formatted message to the Talend Management Console task log.

0693p000008uqMAAAY.png

Executing the batch file triggered task

The configuration details to execute the batch file triggered task are as follows:

  1. Publish the triggered task to Talend Cloud.

  2. Assign the task to a Remote Engine. You do not need to supply a value to the user-defined parameter.

  3. Click the Go Live button.

  4. Copy the Task ID value, because you will need it for the parameters file you are about to create.

    0693p000008uqMFAAY.jpg

     

  5. Create a CSV file similar to the example below that contains the user-defined parameters and values for the task, and store it somewhere in the filesystem on your Remote Engine. The value set in the Studio Job is /var/tmp/batch_file_trigger_poc.csv.

    parameter_interval;3
    parameter_directory;/tmp/fileTriggerTest
    parameter_mask;junk.txt
    parameter_task_id;xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    parameter_auth_id;yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
  6. For the parameter_task_id include the Task ID of the triggered task you copied.

  7. For the parameter_auth_id include your user credentials as a base64-encoded string in the format <uid>:<password>, where uid and password are your Talend Cloud user ID and password, respectively.

  8. Enter values for the other user-defined parameters and save the CSV file.

  9. Publish the batch_file_trigger_poc Job to Talend Cloud.

  10. Execute the Batch File Trigger POC task. The Job runs and does not stop until terminated.

Testing the Job triggering

The process flow steps are:

  1. While the Batch File Trigger POC task is running, check the logs in Talend Management Console. The logs should be empty.

  2. Log in to the machine running your Remote Engine server and navigate to the directory you defined in the parameter_directory for this task. In this case, the parameter_directory points to /var/tmp/fileTriggerTest. Initially, the directory should be empty.

  3. Create a file that matches the parameter_mask you defined. In this example, the mask parameter is a file called junk.txt.

    0693p000008uqKjAAI.png

  4. Switch back to Talend Management Console. Open the task for the triggered Job (tdye_file_triggered_job) and see if the Job ran.

    0693p000008uq7LAAQ.jpg

     

  5. Click View Logs and check the user log for the triggered Job. There should be a single entry that shows the file that caused the Job to trigger.

    0693p000008uqMKAAY.png

  6. In Talend Management Console, return to the Batch File Trigger POC task and view the logs. You can find start and stop entries for the triggering of the task in the user logs for this task.

    0693p000008uqKBAAY.png

  7. Go to the Job logs on the Remote Engine file system to see additional log details.

Talend Route trigger

This method uses an ESB Route to detect the creation of a file and trigger a Talend Job that uses the cTalendJob component. The Route use case is a bit more complicated than the Job use case and more limited in scope. You must have the ability to execute a Route in Talend Cloud, and you must have a Remote Engine with Talend Runtime installed, paired, and available.

By design, a Route runs continuously upon deployment to a Remote Engine with Runtime installed in Talend Cloud. The Route monitors a specific directory for the appearance of a file.

0693p000008uqIUAAY.png

The process flow steps are:

  1. When deployed, a cFile component monitors a directory passed into the Route by a context variable.

  2. When a file appears, the cFile component fires a message, and the filename and path are logged. The full path is put into the message body and passed to a cTalendJob component.

    0693p000008uqBIAAY.png

     

  3. The cTalendJob component calls the Job, in this case, tdye_route_triggered_job.

    0693p000008uq6iAAA.png

     

  4. The Job simply deletes the file passed in from the Route and exits. No data is returned to the Route. The new file must be deleted or moved out of the monitored directory. Otherwise, when undeploying and deploying the Route, any file can cause the cFile component to generate messages for all files in the directory.

    0693p000008uqJgAAI.png

     

  5. In the Job, the Camel message is captured for processing by the tRouteInput component. The only item of interest is the message body, which contains the path of the file that caused the Route to fire a message.

    0693p000008uqMUAAY.png

     

  6. The message body is stored in the Talend Global Map and retrieved by a tFileDelete component. Status from this component is logged to stdout and to the Talend Cloud Manager user log.

 

Related Content

For more information on triggering Jobs in other cloud services, see the following Knowledge Base (KB) articles:

Automate S3 file processing with Talend Cloud and AWS Lambda
Azure Functions to Trigger Talend Cloud Jobs
Automating File Processing from Google Storage with Talend Cloud

Labels (3)
Version history
Last update:
‎2024-08-14 01:57 AM
Updated by: