
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Talend and Amazon Translate Integration
Overview
This article shows how Talend integrates with Amazon Translate, an AWS service that translates data from one language to another, and some of the use cases are multilingual support, content conversion for news feeds, localized marketing campaigns.
The article is a continuation of the Talend AWS Machine Learning integration series. You can read the previous articles, Introduction to Talend and Amazon Real-Time Machine Learning, and Talend and Amazon Comprehend Integration in the Talend Community Knowledge Base (KB).
Environment for Talend and AWS
This article was written using Talend 7.1. However, you can configure earlier versions of Talend with the logic provided to integrate Amazon Translate.
Currently, Amazon Translate is only available in selected AWS regions. Talend recommends verifying the availability of the service from the AWS Global Infrastructure, Region Table before creating the overall application architecture.
Talend recommends reviewing the Amazon Translate service list of Supported Language Pairs.
Practical use case
This section discusses a practical use case where Talend can help in automatic language conversion of incoming data by integrating with the Amazon Translate service. The use case below is a deviation of the Talend and Amazon Comprehend Integration practical use case.
Automatic multilingual support application
Multilingual customer support has become a necessity for corporations in today's era of globalization. Talend helps customers set up a multilingual support application using its easy integration capabilities with Amazon Comprehend and Amazon Translate services.
The diagram above describes the various stages present in the overall flow and Talend helps to simplify the complex scenarios required for the use case with its signature graphical application design interface and data orchestration capabilities. The various stages involved in the flow are:
-
End users communicate their queries and concerns through the main web site or compliant system in the language of their choice. In the example, queries are in the English and French languages through various web servers.
-
The source data from web servers is transmitted to various producer queues where Kafka handles the queue systems.
-
Talend uses in-built native Kafka connectors, to read the Producer queues and transmit the data to downstream systems.
-
Talend performs the request call to the Amazon Comprehend dominant language detection service by transferring the input text.
-
Talend receives the response from the Amazon Comprehend language detection service in JSON format.
-
Talend parses the JSON and identifies the dominant language. If it is different from the language chosen by the support person, Talend calls the Amazon Translate service and converts the source language to the target language.
-
Talend receives the response from the Amazon Translate service in JSON format. In this example, French is converted to English.
-
Talend parses the JSON and fetches the data in the target language. The data is transmitted to the Consumer Kafka queue using the native Kafka connector components.
-
Support staff receives the request in the language of their choice and provides feedback.
Configure a Talend Routine for Amazon Translate
Create a Talend user routine, by performing the following steps.
-
Connect to Talend Studio, and create a new routine called AWS_Translate that connects to the Amazon Translate service to transmit the incoming input text and collect the response back from the Amazon Translate service.
-
Insert the following code into the Talend Routine:
package routines; //Amazon SDK 1.11.438 import com.amazonaws.auth.BasicAWSCredentials; import com.amazonaws.auth.AWSStaticCredentialsProvider; import com.amazonaws.services.translate.AmazonTranslate; import com.amazonaws.services.translate.AmazonTranslateClient; import com.amazonaws.services.translate.AmazonTranslateClientBuilder; import com.amazonaws.services.translate.model.TranslateTextRequest; import com.amazonaws.services.translate.model.TranslateTextResult; import org.apache.commons.logging.LogFactory; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.annotation.JsonView; import org.apache.http.protocol.HttpRequestExecutor; import org.apache.http.client.HttpClient; import org.apache.http.conn.DnsResolver; import org.joda.time.format.DateTimeFormat; public class AWS_Translate { public static String Translate(String AWS_Access_Key,String AWS_Secret_Key, String AWS_regionName,String input_text,String source_lang_code, String target_lang_code) { // AWS Connection BasicAWSCredentials awsCreds = new BasicAWSCredentials(AWS_Access_Key,AWS_Secret_Key); AmazonTranslate translate = AmazonTranslateClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(awsCreds)).withRegion(AWS_regionName).build(); //AWS_Translate TranslateTextRequest request = new TranslateTextRequest() .withText(input_text) .withSourceLanguageCode(source_lang_code) .withTargetLanguageCode(target_lang_code); TranslateTextResult result = translate.translateText(request); String response_text =result.toString(); return response_text; } }
-
The Talend routine needs additional JAR files. Install the following JAR files in the routine:
- AWS SDK 1.11.438
- apache.commons.logging 1.2.0
- Jackson core 2.9.7
- Jackson Annotations 2.9.0
- Jackson Databind 2.9.7
- httpcore 4.4.10
- httpclient 4.5.6
- joda-time 2.9.4
-
Add additional Java libraries to the routine. For more information on how to add Java libraries, see the Talend and Amazon Comprehend Integration article of the series.
The setup activities are complete. The next section shows sample Jobs for the functionalities described in the practical use cases.
For ease of understanding, and to keep the focus on the integration between Talend and Amazon Translate, the sample Job uses a CSV file for input and a tLogrow component for output.
Talend sample Job for Amazon Translate
The input.csv file, attached to this article, provides the data for the sample Job. The data from the input file is transmitted to the Amazon Translate service, and the response is captured. The response from Amazon Translate service (in JSON format) is parsed, and the output text in the target language is added to each row and published in the console.
The configuration details are as follows:
-
Create a new Standard Job called AWS_Translate_sample_job, or use the sample Job, AWS_Translate_sample_job.zip, attached to this article.
-
The first stage in associating the routine to a Talend Job is to add the routines to the newly created Job, by selecting Setup routine dependencies.
-
Add the AWS_Translate routine to the User routines section of the pop-up screen, to link the newly created routine to the Talend Job.
-
Review the overall Job flow, shown in the following diagram.
-
Configure the context variables, as shown below:
Note: if you are using Amazon Comprehend for source language detection, you must populate the variable source_lang_code from the output of the Amazon Comprehend call through Talend. For more information, see the Talend and Amazon Comprehend Integration article of the series.
-
The input file for the Job, input.csv, attached to this article, contains the data to process from English to French using Amazon Translate service.
-
Configure the tFileInputDelimited component, as shown below:
-
Use the tMap component where the call to the Amazon Translate service is made through Talend routine. You will have to pass the parameters mentioned in the code snippet in the same order as the function call in the tMap component.
AWS_Translate.Translate(context.AWS_Access_Key,context.AWS_Secret_Key, context.AWS_regionName,row1.input_data,context.source_lang_code, context.target_lang_code)
-
Configure the tMap component layout, as shown below:
-
The output from the Amazon Translate call is a string in JSON format. The translated text is parsed to the variables, as shown below. Leave the input_text column empty because you are going to map them directly from the input flow.
-
Notice that the input data passes to a tLogrow component that translates the output data and displays in the console.
In practical scenarios, the output at this stage can be passed to downstream systems for further processing and storage.
Threshold limits for data processing
At the time of this writing; Amazon Translate can handle a maximum size of 5 kb text. Another constraint applied by the Amazon Translate service is related to the maximum length of the text, which is 10,000 characters.
Conclusion
This article depicts a use case of integrating Talend with the Amazon Translate service. In real time scenarios, data can flow from multiple source systems, such as batch files, web services, queues, or APIs. Talend can integrate all these diverse source systems with the Amazon Translate service in a straightforward way.
Citations
AWS documentations
- Amazon Translate Service
- Introducing Amazon Translate – Real-time Language Translation
- Translate a chat channel using Amazon Translate