Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details

Talend Job execution using Apache Airflow

100% helpful (1/1)
cancel
Showing results for 
Search instead for 
Did you mean: 
TalendSolutionExpert
Contributor II
Contributor II

Talend Job execution using Apache Airflow

Last Update:

Oct 24, 2024 7:14:05 AM

Updated By:

Sonja_Bauernfeind

Created date:

Apr 1, 2021 6:05:05 AM

Attachments

Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. Airflow uses Directed Acyclic Graph (DAG) to create workflows or tasks. For more information, see the Apache Airflow Documentation page.

This article shows you how to leverage Apache Airflow to orchestrate, schedule, and execute Talend Data Integration (DI) Jobs.

Environment

  • Apache Airflow 1.10.2
  • Nexus 3.9
  • WinSCP 5.15
  • PuTTY

Prerequisites

  1. Apache Airflow installed on a server (follow the Installing Apache Airflow on Ubuntu/AWS installation instructions).
  2. Python 2.7 installed on the Airflow server.
  3. Java 1.8 installed on the Airflow server.
  4. Access to the Nexus server from the Airflow server (in this example, both Nexus and Airflow are installed on the same server).
  5. Talend 7.x Jobs published to the Nexus repository. (For more information on how to set up a CI/CD pipeline to publish Talend Jobs to Nexus, see Configuring Jenkins to build and deploy project items in the Talend Help Center.)
  6. Access to the setup_files.zip file (attached to this article).

Process flow

  1. Develop Talend DI Jobs using Talend Studio.
  2. Publish the DI Jobs to the Nexus repository using Talend CI/CD module.
  3. Execute the Directed Acyclic Graph (DAG) in Apache Airflow:
    • The first step in DAG is to download the Job executable from Nexus using the customized script.
    • The second step is to execute the downloaded Job.

    0693p000008uLr2AAE.jpg

Configuration and execution

  1. Login to the Airflow server through SSH using WinSCP or PuTTY.
  2. Create two folders named jobs and scripts under the AIRFLOW_HOME folder.

    0693p000008uLnVAAU.png

  3. Extract the setup_files.zip, then copy the shell scripts (download_job.sh and delete_job.sh) to the scripts folder.

    0693p000008uLrHAAU.png

  4. Copy the talend_job_dag_template.py file from the setup_files.zip to your local machine and update the following:

    • nexus_host
    • nexus_port
    • airflow_home
    • nexus_repo
    • job_group_id
    • job_name
    • job_version

    Also, update the default_args dictionary based on your requirements.

    0693p000008uLrMAAU.jpg

    For more information, see the Apache Airflow documentation: Default Arguments.

  5. The DAG template provided is programmed to trigger the task externally. If you plan to schedule the task, update the schedule_interval parameter under the DAG for airflow task with values based on your scheduling requirements.

    0693p000008uLrRAAU.jpg

    For more information on values, see the Apache Airflow documentation: DAG Runs.

  6. Rename the updated template file and place it in the dags folder under the AIRFLOW_HOME folder.
  7. After the Airflow scheduler picks up the DAG file, a compiled file with the same name and with a .pyc extension is created.

    0693p000008uLfTAAU.jpg

  8. Refresh the Airflow UI screen to see the DAG.

    Note: If the DAG is not visible on the User Interface under the DAGs tab, restart the Airflow webserver and the Airflow scheduler.

    0693p000008uLrlAAE.jpg

  9. To schedule the task, toggle the button to On. You can also run the task manually.0693p000008uLrqAAE.jpg
  10. Monitor the run status on the Airflow UI.

Conclusion

In this article, you learned how to author, schedule, and monitor workflows from the Airflow UI, and how to download and trigger Talend Jobs for execution.

Comments
jsnasli
Contributor
Contributor

Hi,

Where can I find the setup_files.zip file.

I did not found the attached file to this article.

Thank you

Sonja_Bauernfeind
Digital Support
Digital Support

Hello @jsnasli 

Please check now! 

All the best,
Sonja 

Version history
Last update:
‎2024-10-24 07:14 AM
Updated by: