<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>article Talend and Amazon Comprehend Integration in Official Support Articles</title>
    <link>https://community.qlik.com/t5/Official-Support-Articles/Talend-and-Amazon-Comprehend-Integration/ta-p/2151615</link>
    <description>&lt;P&gt;This article shows how seamlessly Talend integrates with &lt;A href="https://aws.amazon.com/comprehend/" target="_blank" rel="noopener"&gt;Amazon Comprehend&lt;/A&gt;, a natural language processing (NLP) service from AWS. It is an in-depth guide on how to use Talend to harness the dominant language detection and sentimental analysis capabilities of Amazon Comprehend.&lt;/P&gt;
&lt;P&gt;The article is a continuation of the Talend AWS Machine Learning integration series. You can read the previous article, &lt;A href="https://community.qlik.com/t5/Design-and-Development/Introduction-to-Talend-and-Amazon-Real-Time-Machine-Learning/ta-p/2151200" target="_blank" rel="noopener"&gt;Introduction to Talend and Amazon Real-Time Machine Learning&lt;/A&gt;, in the Talend Community Knowledge Base (KB).&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Content:&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;LI-TOC indent="15" liststyle="none" maxheadinglevel="4"&gt;&lt;/LI-TOC&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Environment for Talend and AWS&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;This article was written using Talend 7.1. However, you can configure earlier versions of Talend with the logic provided to integrate Amazon Comprehend.&lt;/P&gt;
&lt;P&gt;Currently, Amazon Comprehend is only available in selected AWS regions. Talend recommends verifying the availability of the service from the AWS Global Infrastructure, &lt;A href="https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/" target="_blank" rel="noopener"&gt;Region Table&lt;/A&gt;, before creating the overall application architecture.&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Practical use cases&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;This section discusses practical use cases where Talend can help in dominant language detection from incoming data and sentimental analysis of input data by integrating with the Amazon Comprehend service.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Automatic multilingual support application&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;For multinational corporations, or companies working in an operational environment, where end users are interested in communicating in their native language, it would be ideal to have a multilingual support application. Talend and Amazon Comprehend, help to categorize the support cases based on the dominant language present in the customer requests.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikfW.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122237i07C70E2ECD86C6A7/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikfW.png" alt="0EM3p000001ikfW.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;The diagram above describes the various stages present in the overall flow and Talend helps to simplify the application with a graphical application design interface. The various stages involved in the flow are:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;End users communicate their queries and concerns through a web site in the language of their choice. In the example, queries are in the English, French, German, and Italian languages through various web servers. In the absence of language identification, web servers are usually mapped based on their IP addresses. So, if an English-speaking person would like to raise a ticket from Paris in France, it would typically go to a French Support system.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;In the current layout, the data from the web servers is transmitted to various Producer queues where Kafka handles the queue systems.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend has in-built components to read and fetch the data from the Kafka queues.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend performs the request call to the Amazon Comprehend dominant language detection service by transferring the input text.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend receives the response from the Amazon Comprehend language detection service in JSON format.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend parses the JSON and identifies the dominant language. If the Amazon Comprehend service has sent multiple languages in the results set, Talend parses the JSON, extracts the JSON values for each language, sorts the data based on the score in descending fashion, and selects the highest-ranking language among the various scores available in the results set. Talend transmits the data to the corresponding consumer Kafka queues based on the dominant language criteria.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Support staff from corresponding language service attends the request from the customer and provides feedback and resolution in the language of their choice automatically.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Real-time sentiment analysis dashboard&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Customer sentiment analysis of a company is crucial in today’s highly competitive corporate world. Talend, Amazon Comprehend, and Snowflake help to perform real-time sentiment analysis from customer data feeds generated from multiple source systems.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M12l.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123106iD5A551DE1733C7FC/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M12l.png" alt="0683p000009M12l.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;The diagram above depicts the various stages involved in a customer sentiment analysis dashboard. The various steps involved in the flow are:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Customer comments are captured by the company web servers or feeds from third party web sites.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The incoming data is transmitted to various Producer queues maintained by Kafka.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend reads the input data from Kafka using native Kafka components.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend processes the inbound data from the Kafka queues and transmits it to Amazon Comprehend as a request for sentiment analysis.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend receives the response from Amazon Comprehend sentiment analysis.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend parses the response data from Amazon Comprehend in JSON format and evaluates the overall sentiment and individual scores for positive, negative, neutral, and mixed. The parsed data, along with input text, is transmitted from Talend to Snowflake Cloud Data warehouse using native components.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Once the data is loaded to Snowflake, real-time dashboards showing overall customer sentiments are generated from Snowflake.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Note&lt;/STRONG&gt;: The above scenarios are simple illustrations of data flow solely based on language detection and sentiment detection of the input data. Talend recommends applying additional data privacy-related rules, such as GDPR, on top of the current layout using Talend, through its easy to use the graphical interface.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Configure a Talend routine for Amazon Comprehend&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Create a Talend user routine, by performing the following steps. Both dominant language detection and sentiment analysis functionalities are embedded under the same Talend routines as multiple Java functions.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Connect to Talend Studio, and create a new routine called &lt;STRONG&gt;AWS_Comprehend &lt;/STRONG&gt;that connects to the Amazon Comprehend service to transmit the incoming input text and collect the response back from the Amazon Comprehend service.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikci.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125229iF103033E4617C635/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikci.png" alt="0EM3p000001ikci.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Insert the following code into the Talend routine:&lt;/P&gt;
&lt;PRE&gt;package routines;

//Amazon SDK 1.11.438

import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.services.comprehend.AmazonComprehend;
import com.amazonaws.services.comprehend.AmazonComprehendClientBuilder;
import com.amazonaws.services.comprehend.model.DetectSentimentRequest;
import com.amazonaws.services.comprehend.model.DetectSentimentResult;
import com.amazonaws.services.comprehend.model.DetectDominantLanguageRequest;
import com.amazonaws.services.comprehend.model.DetectDominantLanguageResult;

import org.apache.commons.logging.LogFactory;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.annotation.JsonView;
import org.apache.http.protocol.HttpRequestExecutor;
import org.apache.http.client.HttpClient;
import org.apache.http.conn.DnsResolver;
import org.joda.time.format.DateTimeFormat;

public class AWS_Comprehend {
	
	
public static String Dominant_Language(String AWS_Access_Key,String AWS_Secret_Key, String AWS_regionName,String input_text) 
{
BasicAWSCredentials awsCreds = new BasicAWSCredentials(AWS_Access_Key,AWS_Secret_Key);

AmazonComprehend comprehendClient = AmazonComprehendClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(awsCreds)).withRegion(AWS_regionName).build();

// Call detectDominantLanguage API
DetectDominantLanguageRequest detectDominantLanguageRequest = new DetectDominantLanguageRequest().withText(input_text);
DetectDominantLanguageResult detectDominantLanguageResult = comprehendClient.detectDominantLanguage(detectDominantLanguageRequest);
		        
String response_JSON=detectDominantLanguageResult.getLanguages().toString();
return response_JSON;
}

public static String Sentiment_Detection(String AWS_Access_Key,String AWS_Secret_Key, String AWS_regionName,String input_text, String language_code) 
{
BasicAWSCredentials awsCreds = new BasicAWSCredentials(AWS_Access_Key,AWS_Secret_Key);

AmazonComprehend comprehendClient = AmazonComprehendClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(awsCreds)).withRegion(AWS_regionName).build();

// Call Sentiment Detection API
DetectSentimentRequest detectSentimentRequest = new DetectSentimentRequest().withText(input_text).withLanguageCode(language_code);
String response_JSON=comprehendClient.detectSentiment(detectSentimentRequest).toString();
return response_JSON;
}
		        
}
&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The Talend routine needs additional JAR files. Install the following JAR files in the routine:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;AWS SDK 1.11.438&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;apache.commons.logging&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Jackson core 2.9.7&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Jackson Annotations 2.9.4&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Jackson Databind 2.9.7&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;httpcore 4.4.10&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;httpclient 4.5.6&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;joda-time 2.9.4&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Add additional Java libraries to the routine by selecting &lt;STRONG&gt;Edit Routine Libraries&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikgF.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123193i8EE2682B57FE3662/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikgF.png" alt="0EM3p000001ikgF.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;New&lt;/STRONG&gt; in the pop-up window to add libraries to the routine.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikgZ.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124275i1D73D6FB709F50F8/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikgZ.png" alt="0EM3p000001ikgZ.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;Artifact repository(local m2/nexus)&lt;/STRONG&gt;, then select &lt;STRONG&gt;Install a new module&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikgj.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125119i1F804CDE4F15712D/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikgj.png" alt="0EM3p000001ikgj.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select the JAR file from the local drive.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikvP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122269i6887BE1E52F96B6F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikvP.png" alt="0EM3p000001ikvP.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;Detect the module install status&lt;/STRONG&gt; to verify whether the module is already installed.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikif.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122272iE6F4CB7F16766E31/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikif.png" alt="0EM3p000001ikif.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;If the JAR file is not installed, the status changes from the error flag to &lt;STRONG&gt;Install a module&lt;/STRONG&gt; followed by JAR file name. Click &lt;STRONG&gt;OK&lt;/STRONG&gt; to load the JAR file to the routine. Once all the JAR files are installed, click &lt;STRONG&gt;Finish&lt;/STRONG&gt;.&lt;/P&gt;
&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikik.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122928i8C25C9212289150A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikik.png" alt="0EM3p000001ikik.png" /&gt;&lt;/span&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The setup activities are complete. The next section shows sample Jobs for the functionalities described in the practical use cases.&lt;/P&gt;
&lt;P&gt;For ease of understanding, and to keep the focus on the integration between Talend and Amazon Comprehend, the sample Jobs use text files for input and a &lt;STRONG&gt;tLogrow&lt;/STRONG&gt; component for output.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Talend sample Job for dominant language detection&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;The sample Job, &lt;STRONG&gt;Language_Identifier.zip&lt;/STRONG&gt;, attached to this article, reads the data from the input file and transmits the message to the Amazon Comprehend service. The response from Amazon Comprehend service, in JSON format, is parsed, sorted, and the row with the highest score for dominant language for each inbound text record is published in the console.&lt;/P&gt;
&lt;P&gt;The configuration details are as follows:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Create a new Standard Job called &lt;STRONG&gt;Language_Identifier&lt;/STRONG&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The first stage in associating the routine to a Talend Job is to add the routines to the newly created Job, by selecting &lt;STRONG&gt;Setup routine dependencies&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M1Em.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123754i322E4371E9EE4B34/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1Em.png" alt="0683p000009M1Em.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Add the &lt;STRONG&gt;AWS_Comprehend&lt;/STRONG&gt; routine to the &lt;STRONG&gt;User routines&lt;/STRONG&gt; section of the pop-up screen, to link the newly created routine to the Talend Job.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Note&lt;/STRONG&gt;: You must perform this step for both of the Jobs mentioned in this article.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikfI.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123643i67F2E086C5304F9C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikfI.png" alt="0EM3p000001ikfI.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Review the overall Job flow, shown in the following diagram.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikjT.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123354i6108E960FF6D2A0D/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikjT.png" alt="0EM3p000001ikjT.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Configure the context variables, as shown below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikjs.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124740i9827E65DD3B75042/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikjs.png" alt="0EM3p000001ikjs.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The input file for the Job, &lt;STRONG&gt;detect_language_input.txt&lt;/STRONG&gt;, attached to this article, contains the phrase, &lt;STRONG&gt;I am very happy today&lt;/STRONG&gt;, and is translated into multiple languages using Google translator. The last line of the file has both English and Spanish words added intentionally to measure the difference in scoring pattern when the input data has multiple languages.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikkH.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124114i15069CB937854B90/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikkH.png" alt="0EM3p000001ikkH.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Configure the &lt;STRONG&gt;tFileInputDelimited&lt;/STRONG&gt; component, as shown below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikiR.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125222iFB3A9E34E2DC9950/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikiR.png" alt="0EM3p000001ikiR.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component where the call to Amazon Comprehend service is made through Talend routine. You will have to pass the parameters mentioned in the code snippet in the same order as the function call in the &lt;STRONG&gt;tMap&amp;nbsp;&lt;/STRONG&gt;component.&lt;/P&gt;
&lt;PRE&gt;AWS_Comprehend.Dominant_Language(context.AWS_Access_Key, context.AWS_Secret_Key, context.AWS_regionName, row1.input_text)&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Configure the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component layout, as shown below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikkv.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121908iC693F76015CB8AF5/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikkv.png" alt="0EM3p000001ikkv.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The output from the Amazon Comprehend call is a string in JSON format. If there are multiple languages present in input text, the output JSON has a score for each associated language. The language code and corresponding scores are parsed to the variables, as shown below. Leave the columns &lt;STRONG&gt;id&lt;/STRONG&gt; and &lt;STRONG&gt;input_text&lt;/STRONG&gt; empty because you are going to map them directly from the input flow.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001iklP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121430iA382B86ABEB6924C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001iklP.png" alt="0EM3p000001iklP.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Notice that the score is converted to &lt;STRONG&gt;Double&lt;/STRONG&gt; in this stage.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001iklt.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123278i01DA5F7CC0A699AB/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001iklt.png" alt="0EM3p000001iklt.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Sort the output data according to the &lt;STRONG&gt;id&lt;/STRONG&gt; (in ascending order) and the &lt;STRONG&gt;score&lt;/STRONG&gt; (in descending order) columns.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M1KP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123163iAE41C5337DF058A9/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1KP.png" alt="0683p000009M1KP.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Using a &lt;STRONG&gt;tUniqrow&lt;/STRONG&gt; component for each &lt;STRONG&gt;id&lt;/STRONG&gt;, pick the first record that has a maximum score.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikmN.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122373i239032F9AC26CD19/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikmN.png" alt="0EM3p000001ikmN.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The output data from the previous stage has code values for languages. The mapping of code values to the corresponding language names from the Amazon site is in the &lt;STRONG&gt;language_ref_code.txt&lt;/STRONG&gt; file, attached to this article. Use this file as a lookup before printing the output results.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikmc.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123986i379B11F861BE9B62/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikmc.png" alt="0EM3p000001ikmc.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The inbound data is joined with the reference file, where &lt;STRONG&gt;Join Model&lt;/STRONG&gt; is selected as&lt;STRONG&gt; Inner Join&lt;/STRONG&gt;. The data is passed to the &lt;STRONG&gt;tLogrow&lt;/STRONG&gt; component to print the output in the console.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001iknz.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122591i0FB469F029AE3619/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001iknz.png" alt="0EM3p000001iknz.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Review the dominant language and the corresponding score for each input text. Note that the score of the last row is different from the other rows because the input sentence is a mix of English and Spanish.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M19c.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123664iF0EDFA512201431E/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M19c.png" alt="0683p000009M19c.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;In practical scenarios, the output at this stage can be passed to downstream systems to by channeling through different data flows based on the corresponding language of the sentence.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Talend sample Job for sentiment analysis&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;The sample Job, &lt;STRONG&gt;Sentiment_Analysis.zip&lt;/STRONG&gt;, attached to this article, extracts the input text from the CSV file and performs a call to the Amazon Comprehend sentiment analysis service. The output from the service is parsed and displayed in the console.&lt;/P&gt;
&lt;P&gt;The configuration details are as follows:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Create a new Standard Job called &lt;STRONG&gt;Sentiment_Analysis&lt;/STRONG&gt;. The new user routine, &lt;STRONG&gt;AWS_Comprehend&lt;/STRONG&gt;, is attached to the Job as shown in previous example. The following diagram shows the overall Job flow:&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M1Ka.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125178iFBAC18CA68777844/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1Ka.png" alt="0683p000009M1Ka.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Data in the sample text file, &lt;STRONG&gt;sentiment_analysis_input.txt&lt;/STRONG&gt;, attached in this article, has an &lt;STRONG&gt;id&lt;/STRONG&gt; and &lt;STRONG&gt;input_text&lt;/STRONG&gt;, with different sentiments, for each record.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikqt.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121857iCEEBBA507293D0AE/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikqt.png" alt="0EM3p000001ikqt.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Use a &lt;STRONG&gt;tFileInputDelimited&lt;/STRONG&gt; component to configure the input file, as shown below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikqP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123729iC9EEF37522298321/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikqP.png" alt="0EM3p000001ikqP.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend calls the &lt;STRONG&gt;Sentiment_Detection&lt;/STRONG&gt; function of the &lt;STRONG&gt;AWS_Comprehend&lt;/STRONG&gt; routine in the &lt;STRONG&gt;tMap &lt;/STRONG&gt;component, as shown below. This transfers the data from Talend to Amazon Comprehend and sends the responses back to the &lt;STRONG&gt;sentiment_results&lt;/STRONG&gt; field.&lt;/P&gt;
&lt;PRE&gt;AWS_Comprehend.Sentiment_Detection(context.AWS_Access_Key, context.AWS_Secret_Key, context.AWS_regionName, row1.input_text,context.language_code)&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Configure the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component, as shown below.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikr3.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121726i0E72384DA282A3D3/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikr3.png" alt="0EM3p000001ikr3.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The output data from the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component has the sentiment analysis results from Amazon Comprehend but the results are in JSON format. Use the &lt;STRONG&gt;tExtractJSONFields&lt;/STRONG&gt; component to parse the overall sentiment of the text, positive sentiment score, negative sentiment score, neutral sentiment score, and mixed sentiment score along with original input fields, &lt;STRONG&gt;id&lt;/STRONG&gt; and&lt;STRONG&gt; input_text&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikrN.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123117i1125F0BE4FECD34F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikrN.png" alt="0EM3p000001ikrN.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Notice that the data type of fields with sentiment scores are converted to a &lt;STRONG&gt;Double&lt;/STRONG&gt; data type for any further analysis.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikrm.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124124i40DC170F0A5224FD/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikrm.png" alt="0EM3p000001ikrm.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Review the data printed in the output console. The &lt;STRONG&gt;overall_sentiment&lt;/STRONG&gt; column provides the sentiment of the input text and the four columns after that provides the individual scores for each sentiment.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M1Hh.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121477i2CED31B99BCDC141/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1Hh.png" alt="0683p000009M1Hh.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Threshold limits for data processing&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;At the time of this writing; Amazon Comprehend can handle 5,000 UTF-8 characters per document. Talend recommends that you always verify the latest performance benchmarks on the AWS Documentation, &lt;A href="https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits.html" target="_blank" rel="noopener"&gt;Guidelines and Limits&lt;/A&gt; page, and that you provide a minimum of 20 characters per input text for best results from Amazon Comprehend service.&lt;/P&gt;
&lt;P&gt;Amazon Comprehend dominant language detection is currently available for 100 languages, and Amazon Comprehend sentiment analysis is available for English, French, German, Spanish, Italian, and Portuguese languages. Refer to the AWS Documentation, &lt;A href="https://docs.aws.amazon.com/comprehend/latest/dg/supported-languages.html" target="_self"&gt;Languages Supported in Amazon Comprehend&lt;/A&gt; page for the latest list.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Conclusion&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;This article depicts use cases of integrating Talend with Amazon Comprehend service. In real time scenarios, data input flow is in the form of web services or queues instead of input files mentioned in the sample Jobs.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Citations&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;AWS documentation, &lt;A href="https://docs.aws.amazon.com/comprehend/latest/dg/comprehend-general.html" target="_blank" rel="noopener"&gt;Amazon Comprehend&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 23 Jan 2024 02:35:30 GMT</pubDate>
    <dc:creator>TalendSolutionExpert</dc:creator>
    <dc:date>2024-01-23T02:35:30Z</dc:date>
    <item>
      <title>Talend and Amazon Comprehend Integration</title>
      <link>https://community.qlik.com/t5/Official-Support-Articles/Talend-and-Amazon-Comprehend-Integration/ta-p/2151615</link>
      <description>&lt;P&gt;This article shows how seamlessly Talend integrates with &lt;A href="https://aws.amazon.com/comprehend/" target="_blank" rel="noopener"&gt;Amazon Comprehend&lt;/A&gt;, a natural language processing (NLP) service from AWS. It is an in-depth guide on how to use Talend to harness the dominant language detection and sentimental analysis capabilities of Amazon Comprehend.&lt;/P&gt;
&lt;P&gt;The article is a continuation of the Talend AWS Machine Learning integration series. You can read the previous article, &lt;A href="https://community.qlik.com/t5/Design-and-Development/Introduction-to-Talend-and-Amazon-Real-Time-Machine-Learning/ta-p/2151200" target="_blank" rel="noopener"&gt;Introduction to Talend and Amazon Real-Time Machine Learning&lt;/A&gt;, in the Talend Community Knowledge Base (KB).&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Content:&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;LI-TOC indent="15" liststyle="none" maxheadinglevel="4"&gt;&lt;/LI-TOC&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Environment for Talend and AWS&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;This article was written using Talend 7.1. However, you can configure earlier versions of Talend with the logic provided to integrate Amazon Comprehend.&lt;/P&gt;
&lt;P&gt;Currently, Amazon Comprehend is only available in selected AWS regions. Talend recommends verifying the availability of the service from the AWS Global Infrastructure, &lt;A href="https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/" target="_blank" rel="noopener"&gt;Region Table&lt;/A&gt;, before creating the overall application architecture.&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Practical use cases&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;This section discusses practical use cases where Talend can help in dominant language detection from incoming data and sentimental analysis of input data by integrating with the Amazon Comprehend service.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Automatic multilingual support application&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;For multinational corporations, or companies working in an operational environment, where end users are interested in communicating in their native language, it would be ideal to have a multilingual support application. Talend and Amazon Comprehend, help to categorize the support cases based on the dominant language present in the customer requests.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikfW.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122237i07C70E2ECD86C6A7/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikfW.png" alt="0EM3p000001ikfW.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;The diagram above describes the various stages present in the overall flow and Talend helps to simplify the application with a graphical application design interface. The various stages involved in the flow are:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;End users communicate their queries and concerns through a web site in the language of their choice. In the example, queries are in the English, French, German, and Italian languages through various web servers. In the absence of language identification, web servers are usually mapped based on their IP addresses. So, if an English-speaking person would like to raise a ticket from Paris in France, it would typically go to a French Support system.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;In the current layout, the data from the web servers is transmitted to various Producer queues where Kafka handles the queue systems.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend has in-built components to read and fetch the data from the Kafka queues.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend performs the request call to the Amazon Comprehend dominant language detection service by transferring the input text.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend receives the response from the Amazon Comprehend language detection service in JSON format.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend parses the JSON and identifies the dominant language. If the Amazon Comprehend service has sent multiple languages in the results set, Talend parses the JSON, extracts the JSON values for each language, sorts the data based on the score in descending fashion, and selects the highest-ranking language among the various scores available in the results set. Talend transmits the data to the corresponding consumer Kafka queues based on the dominant language criteria.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Support staff from corresponding language service attends the request from the customer and provides feedback and resolution in the language of their choice automatically.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Real-time sentiment analysis dashboard&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Customer sentiment analysis of a company is crucial in today’s highly competitive corporate world. Talend, Amazon Comprehend, and Snowflake help to perform real-time sentiment analysis from customer data feeds generated from multiple source systems.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M12l.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123106iD5A551DE1733C7FC/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M12l.png" alt="0683p000009M12l.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;The diagram above depicts the various stages involved in a customer sentiment analysis dashboard. The various steps involved in the flow are:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Customer comments are captured by the company web servers or feeds from third party web sites.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The incoming data is transmitted to various Producer queues maintained by Kafka.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend reads the input data from Kafka using native Kafka components.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend processes the inbound data from the Kafka queues and transmits it to Amazon Comprehend as a request for sentiment analysis.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend receives the response from Amazon Comprehend sentiment analysis.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend parses the response data from Amazon Comprehend in JSON format and evaluates the overall sentiment and individual scores for positive, negative, neutral, and mixed. The parsed data, along with input text, is transmitted from Talend to Snowflake Cloud Data warehouse using native components.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Once the data is loaded to Snowflake, real-time dashboards showing overall customer sentiments are generated from Snowflake.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Note&lt;/STRONG&gt;: The above scenarios are simple illustrations of data flow solely based on language detection and sentiment detection of the input data. Talend recommends applying additional data privacy-related rules, such as GDPR, on top of the current layout using Talend, through its easy to use the graphical interface.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Configure a Talend routine for Amazon Comprehend&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Create a Talend user routine, by performing the following steps. Both dominant language detection and sentiment analysis functionalities are embedded under the same Talend routines as multiple Java functions.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Connect to Talend Studio, and create a new routine called &lt;STRONG&gt;AWS_Comprehend &lt;/STRONG&gt;that connects to the Amazon Comprehend service to transmit the incoming input text and collect the response back from the Amazon Comprehend service.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikci.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125229iF103033E4617C635/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikci.png" alt="0EM3p000001ikci.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Insert the following code into the Talend routine:&lt;/P&gt;
&lt;PRE&gt;package routines;

//Amazon SDK 1.11.438

import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.services.comprehend.AmazonComprehend;
import com.amazonaws.services.comprehend.AmazonComprehendClientBuilder;
import com.amazonaws.services.comprehend.model.DetectSentimentRequest;
import com.amazonaws.services.comprehend.model.DetectSentimentResult;
import com.amazonaws.services.comprehend.model.DetectDominantLanguageRequest;
import com.amazonaws.services.comprehend.model.DetectDominantLanguageResult;

import org.apache.commons.logging.LogFactory;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.annotation.JsonView;
import org.apache.http.protocol.HttpRequestExecutor;
import org.apache.http.client.HttpClient;
import org.apache.http.conn.DnsResolver;
import org.joda.time.format.DateTimeFormat;

public class AWS_Comprehend {
	
	
public static String Dominant_Language(String AWS_Access_Key,String AWS_Secret_Key, String AWS_regionName,String input_text) 
{
BasicAWSCredentials awsCreds = new BasicAWSCredentials(AWS_Access_Key,AWS_Secret_Key);

AmazonComprehend comprehendClient = AmazonComprehendClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(awsCreds)).withRegion(AWS_regionName).build();

// Call detectDominantLanguage API
DetectDominantLanguageRequest detectDominantLanguageRequest = new DetectDominantLanguageRequest().withText(input_text);
DetectDominantLanguageResult detectDominantLanguageResult = comprehendClient.detectDominantLanguage(detectDominantLanguageRequest);
		        
String response_JSON=detectDominantLanguageResult.getLanguages().toString();
return response_JSON;
}

public static String Sentiment_Detection(String AWS_Access_Key,String AWS_Secret_Key, String AWS_regionName,String input_text, String language_code) 
{
BasicAWSCredentials awsCreds = new BasicAWSCredentials(AWS_Access_Key,AWS_Secret_Key);

AmazonComprehend comprehendClient = AmazonComprehendClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(awsCreds)).withRegion(AWS_regionName).build();

// Call Sentiment Detection API
DetectSentimentRequest detectSentimentRequest = new DetectSentimentRequest().withText(input_text).withLanguageCode(language_code);
String response_JSON=comprehendClient.detectSentiment(detectSentimentRequest).toString();
return response_JSON;
}
		        
}
&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The Talend routine needs additional JAR files. Install the following JAR files in the routine:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;AWS SDK 1.11.438&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;apache.commons.logging&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Jackson core 2.9.7&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Jackson Annotations 2.9.4&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Jackson Databind 2.9.7&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;httpcore 4.4.10&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;httpclient 4.5.6&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;joda-time 2.9.4&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Add additional Java libraries to the routine by selecting &lt;STRONG&gt;Edit Routine Libraries&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikgF.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123193i8EE2682B57FE3662/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikgF.png" alt="0EM3p000001ikgF.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;New&lt;/STRONG&gt; in the pop-up window to add libraries to the routine.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikgZ.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124275i1D73D6FB709F50F8/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikgZ.png" alt="0EM3p000001ikgZ.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;Artifact repository(local m2/nexus)&lt;/STRONG&gt;, then select &lt;STRONG&gt;Install a new module&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikgj.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125119i1F804CDE4F15712D/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikgj.png" alt="0EM3p000001ikgj.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select the JAR file from the local drive.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikvP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122269i6887BE1E52F96B6F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikvP.png" alt="0EM3p000001ikvP.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;Detect the module install status&lt;/STRONG&gt; to verify whether the module is already installed.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikif.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122272iE6F4CB7F16766E31/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikif.png" alt="0EM3p000001ikif.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;If the JAR file is not installed, the status changes from the error flag to &lt;STRONG&gt;Install a module&lt;/STRONG&gt; followed by JAR file name. Click &lt;STRONG&gt;OK&lt;/STRONG&gt; to load the JAR file to the routine. Once all the JAR files are installed, click &lt;STRONG&gt;Finish&lt;/STRONG&gt;.&lt;/P&gt;
&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikik.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122928i8C25C9212289150A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikik.png" alt="0EM3p000001ikik.png" /&gt;&lt;/span&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The setup activities are complete. The next section shows sample Jobs for the functionalities described in the practical use cases.&lt;/P&gt;
&lt;P&gt;For ease of understanding, and to keep the focus on the integration between Talend and Amazon Comprehend, the sample Jobs use text files for input and a &lt;STRONG&gt;tLogrow&lt;/STRONG&gt; component for output.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Talend sample Job for dominant language detection&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;The sample Job, &lt;STRONG&gt;Language_Identifier.zip&lt;/STRONG&gt;, attached to this article, reads the data from the input file and transmits the message to the Amazon Comprehend service. The response from Amazon Comprehend service, in JSON format, is parsed, sorted, and the row with the highest score for dominant language for each inbound text record is published in the console.&lt;/P&gt;
&lt;P&gt;The configuration details are as follows:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Create a new Standard Job called &lt;STRONG&gt;Language_Identifier&lt;/STRONG&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The first stage in associating the routine to a Talend Job is to add the routines to the newly created Job, by selecting &lt;STRONG&gt;Setup routine dependencies&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M1Em.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123754i322E4371E9EE4B34/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1Em.png" alt="0683p000009M1Em.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Add the &lt;STRONG&gt;AWS_Comprehend&lt;/STRONG&gt; routine to the &lt;STRONG&gt;User routines&lt;/STRONG&gt; section of the pop-up screen, to link the newly created routine to the Talend Job.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Note&lt;/STRONG&gt;: You must perform this step for both of the Jobs mentioned in this article.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikfI.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123643i67F2E086C5304F9C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikfI.png" alt="0EM3p000001ikfI.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Review the overall Job flow, shown in the following diagram.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikjT.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123354i6108E960FF6D2A0D/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikjT.png" alt="0EM3p000001ikjT.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Configure the context variables, as shown below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikjs.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124740i9827E65DD3B75042/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikjs.png" alt="0EM3p000001ikjs.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The input file for the Job, &lt;STRONG&gt;detect_language_input.txt&lt;/STRONG&gt;, attached to this article, contains the phrase, &lt;STRONG&gt;I am very happy today&lt;/STRONG&gt;, and is translated into multiple languages using Google translator. The last line of the file has both English and Spanish words added intentionally to measure the difference in scoring pattern when the input data has multiple languages.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikkH.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124114i15069CB937854B90/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikkH.png" alt="0EM3p000001ikkH.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Configure the &lt;STRONG&gt;tFileInputDelimited&lt;/STRONG&gt; component, as shown below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikiR.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125222iFB3A9E34E2DC9950/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikiR.png" alt="0EM3p000001ikiR.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component where the call to Amazon Comprehend service is made through Talend routine. You will have to pass the parameters mentioned in the code snippet in the same order as the function call in the &lt;STRONG&gt;tMap&amp;nbsp;&lt;/STRONG&gt;component.&lt;/P&gt;
&lt;PRE&gt;AWS_Comprehend.Dominant_Language(context.AWS_Access_Key, context.AWS_Secret_Key, context.AWS_regionName, row1.input_text)&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Configure the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component layout, as shown below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikkv.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121908iC693F76015CB8AF5/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikkv.png" alt="0EM3p000001ikkv.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The output from the Amazon Comprehend call is a string in JSON format. If there are multiple languages present in input text, the output JSON has a score for each associated language. The language code and corresponding scores are parsed to the variables, as shown below. Leave the columns &lt;STRONG&gt;id&lt;/STRONG&gt; and &lt;STRONG&gt;input_text&lt;/STRONG&gt; empty because you are going to map them directly from the input flow.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001iklP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121430iA382B86ABEB6924C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001iklP.png" alt="0EM3p000001iklP.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Notice that the score is converted to &lt;STRONG&gt;Double&lt;/STRONG&gt; in this stage.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001iklt.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123278i01DA5F7CC0A699AB/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001iklt.png" alt="0EM3p000001iklt.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Sort the output data according to the &lt;STRONG&gt;id&lt;/STRONG&gt; (in ascending order) and the &lt;STRONG&gt;score&lt;/STRONG&gt; (in descending order) columns.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M1KP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123163iAE41C5337DF058A9/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1KP.png" alt="0683p000009M1KP.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Using a &lt;STRONG&gt;tUniqrow&lt;/STRONG&gt; component for each &lt;STRONG&gt;id&lt;/STRONG&gt;, pick the first record that has a maximum score.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikmN.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122373i239032F9AC26CD19/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikmN.png" alt="0EM3p000001ikmN.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The output data from the previous stage has code values for languages. The mapping of code values to the corresponding language names from the Amazon site is in the &lt;STRONG&gt;language_ref_code.txt&lt;/STRONG&gt; file, attached to this article. Use this file as a lookup before printing the output results.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikmc.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123986i379B11F861BE9B62/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikmc.png" alt="0EM3p000001ikmc.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The inbound data is joined with the reference file, where &lt;STRONG&gt;Join Model&lt;/STRONG&gt; is selected as&lt;STRONG&gt; Inner Join&lt;/STRONG&gt;. The data is passed to the &lt;STRONG&gt;tLogrow&lt;/STRONG&gt; component to print the output in the console.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001iknz.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122591i0FB469F029AE3619/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001iknz.png" alt="0EM3p000001iknz.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Review the dominant language and the corresponding score for each input text. Note that the score of the last row is different from the other rows because the input sentence is a mix of English and Spanish.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M19c.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123664iF0EDFA512201431E/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M19c.png" alt="0683p000009M19c.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;In practical scenarios, the output at this stage can be passed to downstream systems to by channeling through different data flows based on the corresponding language of the sentence.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Talend sample Job for sentiment analysis&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;The sample Job, &lt;STRONG&gt;Sentiment_Analysis.zip&lt;/STRONG&gt;, attached to this article, extracts the input text from the CSV file and performs a call to the Amazon Comprehend sentiment analysis service. The output from the service is parsed and displayed in the console.&lt;/P&gt;
&lt;P&gt;The configuration details are as follows:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Create a new Standard Job called &lt;STRONG&gt;Sentiment_Analysis&lt;/STRONG&gt;. The new user routine, &lt;STRONG&gt;AWS_Comprehend&lt;/STRONG&gt;, is attached to the Job as shown in previous example. The following diagram shows the overall Job flow:&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M1Ka.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125178iFBAC18CA68777844/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1Ka.png" alt="0683p000009M1Ka.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Data in the sample text file, &lt;STRONG&gt;sentiment_analysis_input.txt&lt;/STRONG&gt;, attached in this article, has an &lt;STRONG&gt;id&lt;/STRONG&gt; and &lt;STRONG&gt;input_text&lt;/STRONG&gt;, with different sentiments, for each record.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikqt.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121857iCEEBBA507293D0AE/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikqt.png" alt="0EM3p000001ikqt.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Use a &lt;STRONG&gt;tFileInputDelimited&lt;/STRONG&gt; component to configure the input file, as shown below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikqP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123729iC9EEF37522298321/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikqP.png" alt="0EM3p000001ikqP.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Talend calls the &lt;STRONG&gt;Sentiment_Detection&lt;/STRONG&gt; function of the &lt;STRONG&gt;AWS_Comprehend&lt;/STRONG&gt; routine in the &lt;STRONG&gt;tMap &lt;/STRONG&gt;component, as shown below. This transfers the data from Talend to Amazon Comprehend and sends the responses back to the &lt;STRONG&gt;sentiment_results&lt;/STRONG&gt; field.&lt;/P&gt;
&lt;PRE&gt;AWS_Comprehend.Sentiment_Detection(context.AWS_Access_Key, context.AWS_Secret_Key, context.AWS_regionName, row1.input_text,context.language_code)&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Configure the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component, as shown below.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikr3.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121726i0E72384DA282A3D3/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikr3.png" alt="0EM3p000001ikr3.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The output data from the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component has the sentiment analysis results from Amazon Comprehend but the results are in JSON format. Use the &lt;STRONG&gt;tExtractJSONFields&lt;/STRONG&gt; component to parse the overall sentiment of the text, positive sentiment score, negative sentiment score, neutral sentiment score, and mixed sentiment score along with original input fields, &lt;STRONG&gt;id&lt;/STRONG&gt; and&lt;STRONG&gt; input_text&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikrN.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123117i1125F0BE4FECD34F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikrN.png" alt="0EM3p000001ikrN.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Notice that the data type of fields with sentiment scores are converted to a &lt;STRONG&gt;Double&lt;/STRONG&gt; data type for any further analysis.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM3p000001ikrm.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124124i40DC170F0A5224FD/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM3p000001ikrm.png" alt="0EM3p000001ikrm.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Review the data printed in the output console. The &lt;STRONG&gt;overall_sentiment&lt;/STRONG&gt; column provides the sentiment of the input text and the four columns after that provides the individual scores for each sentiment.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0683p000009M1Hh.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121477i2CED31B99BCDC141/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1Hh.png" alt="0683p000009M1Hh.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Threshold limits for data processing&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;At the time of this writing; Amazon Comprehend can handle 5,000 UTF-8 characters per document. Talend recommends that you always verify the latest performance benchmarks on the AWS Documentation, &lt;A href="https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits.html" target="_blank" rel="noopener"&gt;Guidelines and Limits&lt;/A&gt; page, and that you provide a minimum of 20 characters per input text for best results from Amazon Comprehend service.&lt;/P&gt;
&lt;P&gt;Amazon Comprehend dominant language detection is currently available for 100 languages, and Amazon Comprehend sentiment analysis is available for English, French, German, Spanish, Italian, and Portuguese languages. Refer to the AWS Documentation, &lt;A href="https://docs.aws.amazon.com/comprehend/latest/dg/supported-languages.html" target="_self"&gt;Languages Supported in Amazon Comprehend&lt;/A&gt; page for the latest list.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Conclusion&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;This article depicts use cases of integrating Talend with Amazon Comprehend service. In real time scenarios, data input flow is in the form of web services or queues instead of input files mentioned in the sample Jobs.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Citations&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;AWS documentation, &lt;A href="https://docs.aws.amazon.com/comprehend/latest/dg/comprehend-general.html" target="_blank" rel="noopener"&gt;Amazon Comprehend&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2024 02:35:30 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Official-Support-Articles/Talend-and-Amazon-Comprehend-Integration/ta-p/2151615</guid>
      <dc:creator>TalendSolutionExpert</dc:creator>
      <dc:date>2024-01-23T02:35:30Z</dc:date>
    </item>
  </channel>
</rss>

