Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Feb 5, 2025 4:17:40 AM
Apr 1, 2021 5:57:26 AM
This article shows you how to leverage Apache Airflow to orchestrate, schedule, and execute Talend Data Integration (DI) Jobs in an AWS Lambda environment.
Under Permissions, select Create a new role with basic Lambda permissions from the Execution role drop-down menu. Click Create function.
After the function is created, select API Gateway, from the Add triggers dialog box on the left, to add a trigger to the function.
For more information, see the Amazon API Gateway page.
When the API Gateway is added, a Configuration required warning message appears. Click the API Gateway tile to configure the trigger details. In the Configure triggers section, select Create a new API from the API drop-down menu and Open from the Security drop-down menu. Click Add.
Click the Save button in the upper right corner.
Select the API Gateway tile and review the Details section of the Open API.
Copy the code from the lambda_function_code.py file (located in the Setup_files.zip) into the lambda_function.py in the Function Code window.
Create a new file named download_job.sh and save it under the lambda_function folder. Copy the code from the download_job_code.sh (located in the Setup_files.zip) file, into the new file you created.
In the same window, scroll down to Basic settings and increase the Memory (MB) to a reasonable amount, in this case, 2048 MB. Click Save.
Edit the lambda_DAG_call_template.py file and assign values to the variables, as shown below:
Make sure to provide the http_conn_id and endpoint values under SimpleHttpOperator calls. In this case, the http_conn_id is aws_api.
The DAG template provided is programmed to trigger the task externally. If you plan to schedule the task, update the schedule_interval parameter with values based on your scheduling requirements. For more information on values, see the Apache Airflow documentation: DAG Runs.
Rename the updated file and place it in the dags folder under the AIRFLOW_HOME folder.
The Airflow webserver picks up the file and creates a DAG task in the Airflow Console, under the DAGs tab.
Note: If the DAG is not visible on the User Interface under the DAGs tab, restart the Airflow webserver and Airflow scheduler.
In this article, you learned how to execute Talend DI Jobs in AWS Lambda, and how to use Apache Airflow to schedule the Jobs, which can be extended for further complex orchestration and scheduling plan.