How to connect to Microsoft Azure using cMQConnectionFactory

Talend Version 6.1.1 Summary Connecting to Microsoft Azure using cMQConnectionFactory.Additional Versions ProductTalend ESBComponentcMQConnectio... Show More

Talend Version	6.1.1
Summary	Connecting to Microsoft Azure using cMQConnectionFactory.
Additional Versions
Product	Talend ESB
Component	cMQConnectionFactory and cAMQP
Problem Description
Problem root cause
Solution or Workaround	In order to send messages to the Microsoft Azure queue using the cAMQP component, you have to configure connection factory. Following are the steps to set up a connection to Azure in the cMQConnectionFactory component if the Azure endpoint is something like: Endpoint=sb://mytest.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=1alxVTE9xxxxGFB+E7D7pt7Q74BU0lAzxcvbU+RlJSLs= In the cMQConnectionFactory component, set the Host and Port as follows: Host: "RootManageSharedAccessKey:1alxVTE9xxxxGFB%2BE7D7pt7Q74BU0lAzxcvbU%2BRlJSLs%3D@mytest.servicebus.windows.net" Note: the special characters in the key need to be substituted with their hex characters. In this example, the plus sign was replaced with %2B and the equals sign was replaced with %3D. Port: "5671" Select Use SSL.
JIRA ticket number

Show Less

2

1089

Official Support Articles

Talend ESB: Use tHash components in ESB Runtime

Question Is it safe to use the tHashOutput and tHashInput components in SOAP/REST services or in Jobs called by routes? Answer tHashxxx components a... Show More

Question

Is it safe to use the tHashOutput and tHashInput components in SOAP/REST services or in Jobs called by routes?

Answer

tHashxxx components are not thread-safe and are not supported in Data Services. They are designed to store/keep data/objects in memory and are not recommended for use in the JVM process of the runtime.

Show Less

2

747

Official Support Articles

Exporting a Job script and executing it outside of Talend Studio

Overview Talend Jobs support cross-platform execution. You can develop your Job on one machine, export the Job script, and then move it to another m... Show More

Overview

Talend Jobs support cross-platform execution. You can develop your Job on one machine, export the Job script, and then move it to another machine to execute without any additional configuration except the JDK installation. This article explains how to export the Job script and execute it outside of Talend Studio.

Environment

This procedure was written with:

Talend Open Studio for Data Integration 8.0.1.20211109_1610
JDK version: Oracle JDK build 1.8.0_333
Operating system: Windows 10

Talend verified this procedure to be compatible with Talend Open Studio for Data Integration starting from version 4.2.3.

Starting from version 6.0, Talend Studio requires a JDK installation to build jobs completely. For more information, refer to Requiring a JDK installation to build jobs starting from version 6.0.

Procedure

Create an example Job

Create a Job called ExportDemo. This Job generates the current timestamp and appends it to a file (for example, D:/file/out.txt). The detailed Job design is as follows:
In the tFileOutputDelimited component, check the Append box to append the current timestamp to an existing file whenever the job is executed.
Execute the Job to ensure it works in Talend Studio. Then open the file D:/file/out.txt and verify that the current timestamp was written to the file. For example, the file has a new record as follows:
```
10/5/2023  2:32:53 PM
```

Export the Job script

To export the Job script follow these steps:

Right-click the Job name in the Repository view. Select Build Job (or Export Job prior to version 5.4.0) to export the Job script.
Browse to the location where you exported the Job script. Select the Standalone Job item in the Build type list, then click Finish.

Execute the Job

Copy the zip file to another machine if necessary. Unzip the zip file.
Open the folder where the executable files (jobName_run.bat/jobName_run .sh) are located.

For example: D:\file\ExportDemo_0.1\ExportDemo.
Execute the Job: in this example, by clicking the ExportDemo_run.bat file on a Windows system, or by executing the ExportDemo_run.sh file on a Unix/Linux system.
Open the file D:/file/out.txt and verify that the current timestamp was appended.

Show Less

0

1875

Official Support Articles

tOracleBulkExec and tOracleOutputBulkExec fail with a 'java.io.IOException: Cann...

Talend Version 6.x 7.x 8.x Summary tOracleBulkExec and tOracleOutputBulkExec fail with the following error: java.io.IOException: ... Show More

Talend Version	6.x 7.x 8.x
Summary	tOracleBulkExec and tOracleOutputBulkExec fail with the following error: java.io.IOException: Cannot run program "sqlldr": error=2, No such file or directory
Additional Versions
Product	Talend Data Integration
Component	JobServer,Remote Engine
Problem Description	tOracleBulkExec and tOracleOutputBulkExec fail with the following error: java.io.IOException: Cannot run program "sqlldr": error=2, No such file or directory
Problem root cause	PATH and ORACLE_HOME are not defined / not correctly set in the JobServer / Remote Engine process. To "dump" the environment variables defined in the JobServer / Remote Engine process, create a simple Job using a tJava component. Initialize the tJava component: In Basic settings, add the following: Map<String, String> env = System.getenv(); for (String envName : env.keySet()) { System.out.format("%s=%s%n", envName, env.get(envName)); } In Advanced settings, add the following: import java.util.Map; Run the Job to dump the environment variables. NOTE : for the Cloud (Remote Engine) replace the line : System.out.format("%s=%s%n", envName, env.get(envName)); by context.output += ( envName + " = " + env.get(envName) + "\n"); and display the context.output in the task log using a tJoblog Execute the task in the Cloud with "Execution log level" set to Info
Solution or Workaround	Modify the script launching the JobServer / Remote Engine start_rs.sh (JobServer/agent). Define ORACLE_HOME, for example: ORACLE_HOME=/apps/oracle/product/12.1.0/dbhome_1 export $ORACLE_HOME Set the PATH to contain the directory where the sqlldr is stored: PATH=$ORACLE_HOME/bin:$PATH export $PATH Notes: I - You can set the variables PATH and ORACLE_HOME at the OS level too. For more information, see the Oracle documentation. II - If the option "Run as" (user impersonation) is used , you'll need to update the file /etc/sudoers in order to add the lines : Defaults env_keep += "ORACLE_HOME" Defaults env_keep += "PATH" III - For Remote Engine launched as a service , you'll need to set the variables in the file <Remote Engine Folder>\etc\Talend Remote Engine-wrapper.conf Example : set.default.ORACLE_HOME=/data/oracle/product/19.0.0/client_1 set.default.PATH=/data/oracle/product/19.0.0/client_1/bin%WRAPPER_PATH_SEPARATOR%%PATH%%WRAPPER_PATH_SEPARATOR%.
JIRA ticket number

Show Less

0

570

Official Support Articles

'Talend Big Data Advanced - Spark Streaming' Training Course

This course concentrates on Big Data Spark Jobs. It is mainly focused on Big Data Streaming Jobs but also introduces you Big Data Batch Jobs. Afte... Show More

This course concentrates on Big Data Spark Jobs. It is mainly focused on Big Data Streaming Jobs but also introduces you Big Data Batch Jobs.

After an introduction to Apache Kafka and Apache Spark, you work on a log processing use case, a common Big Data use case. You see how to publish messages to Kafka, subscribe to receive messages, insert data into ElasticSearch and use Kibana to create charts and dashboards. You also see how to save data to and read data from HBase tables.

Target Audience: Anyone who wants to use Talend Studio to interact with big data systems.
Prerequisites: Completion of Talend Big Data Basics.
Badge: Complete this learning path on your way to earning the Talend Big Data Developer Practitioner badge. To know more about the criteria to earn this badge, refer to the Talend Academy Badging Program page.
Availability: This learning plan is available as part of a Talend Academy Learning Subscription.

If you are already a Talend Academy subscriber or want to access the publicly available content on the platform, go to the Talend Academy Welcome page to log in or create an account.

Show Less

0

456

Official Support Articles

tFTP Connection (SFTP) fails with an 'Algorithm negotiation fail' error

Problem Description tFTP Connection with SFTP support, fails with the following error: tFTPConnection_1 Algorithm negotiation fail Exception i... Show More

Problem Description

tFTP Connection with SFTP support, fails with the following error:

tFTPConnection_1 Algorithm negotiation fail
 Exception in component tFTPConnection_1
 com.jcraft.jsch.JSchException: Algorithm negotiation fail

Root Cause

By default, Java provides only a restricted list of algorithms.

Due to import control restrictions, the version of JCE policy files that are bundled in the JDK(TM) environment allow "strong" but limited cryptography to be used.
 The JCE download bundle provides "unlimited strength" policy files which contain no restrictions on cryptographic strengths.

NOTE : This restriction / limitation no more exists in Java 8 updates >=162

https://www.jvmhost.com/articles/jce-unlimited-cipher-policy-different-jdk-versions/
JDK >= 8u162
Unlimited policy files are included and unlimited cipher strength is enabled by default.

The update 162 was released in January 2018 :
https://www.java.com/download/help/release_dates.html

This restriction / limitation does not exist in Java 11
https://help.talend.com/r/en-US/7.3/installation-guide-real-time-big-data-platform-mac/compatible-java-environments

Solution

Download the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files, according to your JDK version, from the Oracle Java Archive Download page.
Extract the local_policy.jar and US_export_policy.jar, and replace them in the following locations:

{jdk home}\jre\lib\security\local_policy.jar

{jdk home}\jre\lib\security\US_export_policy.jar

For more information, see the SAP blog B2B Adapters – Updating to JCE Unlimited Strength Jurisdiction Policy page.

Show Less

0

595

Official Support Articles

How to generate a trace for the HTTP requests executed by Studio

Question How do you generate a trace of the HTTP requests executed by the Talend Studio without using a third-party tool? Answer 1. Create a lo... Show More

Question

How do you generate a trace of the HTTP requests executed by the Talend Studio without using a third-party tool?

Answer

1. Create a logging.properties file (in the c:\temp folder for example) containing the following lines:

.level=FINEST
handlers = java.util.logging.FileHandler
java.util.logging.FileHandler.pattern = c:/temp/debug.txt
java.util.logging.FileHandler.limit = 20480KB
java.util.logging.FileHandler.count = 10
java.util.logging.FileHandler.append = true
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter

2. Create a log4j.properties file (in the c:\temp folder for example) containing the following lines:

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
log4j.logger.org.apache.http=DEBUG
log4j.logger.org.apache.http.wire=DEBUG

Note: the trace generated does not contain all the body of the HTTP requests. If you want to log the body of the HTTP requests and responses, use a HTTP proxy, such as Fiddler.

3. Start Talend Studio by using the following commands:

On Windows : from a CMD window , execute the following 2 commands :

set _JAVA_OPTIONS=-Dlog4j.debug -Dlog4j.configuration=file:"c:\temp\log4j.properties" -Djava.util.logging.config.file=c:\temp\logging.properties
Talend-Studio-win-x86_64.exe --talendDebug > studio_debug.txt 2>&1

On Linux : from a shell terminal , execute the following 2 commands :

export _JAVA_OPTIONS='-Dlog4j.debug -Dlog4j.configuration=file:"/tmp/log4j.properties" -Djava.util.logging.config.file=/tmp/logging.properties'
Talend-Studio-linux-gtk-x86_64 --talendDebug > studio_debug.txt 2>&1

NOTE : On Linux , "adapt" the file logging.properties :

java.util.logging.FileHandler.pattern = /tmp/debug.txt

Collect the 2 files : studio_debug.txt and debug.txt

Show Less

0

1099

Official Support Articles

Integrating Apache Zeppelin data science notebooks with Talend

Overview This article demonstrates how to integrate Zeppelin Notebooks in a Talend DI Job by leveraging the Zeppelin RESTful API. Use the two not... Show More

Overview

This article demonstrates how to integrate Zeppelin Notebooks in a Talend DI Job by leveraging the Zeppelin RESTful API.

Use the two notebooks prepared for you, in the Archive.zip file that is attached to this article. The first notebook trains, using historical data stored on HDFS, a Machine Learning model and saves it to HDFS. The second notebook loads the trained Machine Learning model and uses it to score the new data. Then the article explains how to develop two Talend DI Jobs to integrate those notebooks. The first Job ingests historical data to HDFS, and then triggers its execution. The second Job uploads the new data from S3 to HDFS, runs the notebook that scores the new data, and saves the results to HDFS.

Assumptions

Amazon Web Services (AWS):
- You should be familiar with the AWS platform.
Talend:
- You should have a basic knowledge of Talend Studio.
Restful API:
- You should have a basic understanding of Restful API.

Prerequisite

AWS account
EMR 5.11.1 with Hadoop, Zeppelin, Livy, Hive, Hue, and Spark
AWS S3 bucket
IAM Role to access S3 bucket (Access Key / Secret Key)
EC2 machine with Talend Studio 7.0.1 and above
Source materials: Archive.zip file attached to this article

Apache Zeppelin

Getting started with Zeppelin on EMR

Connect to your AWS Management Console, click Services, then search for EMR and select it.
Click Create cluster, on the following screen, select Go to advanced options.
On the Software and Steps page, select emr-5.11.1 from the Release pull-down menu, then select Hadoop, Zeppelin, Livy, Hive, Hue, and Spark from the Software Configuration list. Click Next.
On the Hardware configuration page, keep the default setting or if needed choose a specific Network and Subnet, then click Next.
On the General Cluster Settings page, under General Options, provide a Cluster name for your cluster. Click Next.
On the Security page, under the Security Options settings, select an EC2 key pair from the drop-down menu, or create one, then click Create cluster.
Wait a few minutes for your cluster to be up and running.
Edit the network security rules by navigating to the EC2 dashboard.
Select the EC2 machine of your master node, then under the Description tab, click Security groups.
Select the Inbound tab and click Edit.
Create a new rule to allow all traffic from the EC2 where Talend will be installed, make sure to use the Security Group ID.
Create a new rule to allow all traffic from your local machine to the cluster.
Go back to your EMR cluster home page and select your cluster. On the Summary tab, notice that you can now access your cluster web UI connections from your local machine.
Click the Hue link, for the first connection, create the Hue superuser, using admin as username, and a password of your choice. Remain on this screen.
Your EMR cluster is all set.

Training a Machine Learning model

Before getting into Zeppelin, you need to upload the training data. Select Files, under Browser, from the menu on the left, and navigate to /user/admin.
Click the New button, and select Directory. Name the directory, Input_Data, then click Create.
Navigate to Input_Data, click Upload > Files, and browse to the TrainingData.csv file and select it.
Return to the EMR Management Console, and click the Zeppelin link.
From the Welcome to Zeppelin page, select Create new note to open the properties box.
1. In the Note Name field, name the note Model_Training.
2. Select spark from the Default Interpreter pull-down menu.
3. Click Create Note.

In the first paragraph, import the TrainingData.csv file and create a Spark DataFrame, by copying and pasting the following code into the note:

val df_data = spark.
read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").
    option("header", "true").
    option("inferSchema", "true").
    load("/user/admin/Input_Data/TrainingData.csv")

Run the paragraph by clicking the Play button on the right-hand side of the paragraph. You can observe the results output below your code.

In the second paragraph, split your data in to three parts, training_data, test_data, and prediction_data, using the code below, then run the paragraph.

val splits = df_data.randomSplit(Array(0.8, 0.18, 0.02), seed = 24L)
val training_data = splits(0).cache()
val test_data = splits(1)
val prediction_data = splits(2)

In the third paragraph, import machine necessary libraries, create features using indexers, and assemble your features in a vector, using the code below, then run the paragraph.

import org.apache.spark.ml.classification.RandomForestClassifier
import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, IndexToString, VectorAssembler}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.ml.{Model, Pipeline, PipelineStage, PipelineModel}
import org.apache.spark.sql.SparkSession

val stringIndexer_label = new StringIndexer().setInputCol("PRODUCT_LINE").setOutputCol("label").fit(df_data)
val stringIndexer_prof = new StringIndexer().setInputCol("PROFESSION").setOutputCol("PROFESSION_IX")
val stringIndexer_gend = new StringIndexer().setInputCol("GENDER").setOutputCol("GENDER_IX")
val stringIndexer_mar = new StringIndexer().setInputCol("MARITAL_STATUS").setOutputCol("MARITAL_STATUS_IX")

val vectorAssembler_features = new VectorAssembler().setInputCols(Array("GENDER_IX", "AGE", "MARITAL_STATUS_IX", "PROFESSION_IX")).setOutputCol("features")

In the last paragraph, select your model, create the pipeline, train it, and save it to disk, using the code below, then run the paragraph. For permission reasons, the model is saved under Hadoop.

val rf = new RandomForestClassifier().setLabelCol("label").setFeaturesCol("features").setNumTrees(5)
val labelConverter = new IndexToString().setInputCol("prediction").setOutputCol("predictedLabel").setLabels(stringIndexer_label.labels)
val pipeline_rf = new Pipeline().setStages(Array(stringIndexer_label, stringIndexer_prof, stringIndexer_gend, stringIndexer_mar, vectorAssembler_features, rf, labelConverter))
val model_rf = pipeline_rf.fit(training_data)
model_rf.write.overwrite().save("/user/hadoop/Model/MyModel")

If you want to check the predicted results on the test data split, add the following code to the next paragraph and run it. Don't forget to remove this paragraph after testing.
```
val prediction_test= model_rf.transform(test_data)
prediction_test.show(10)
```
Your training model notebook is all set. Go to Hue and remove the training data because you will upload it from S3 to the cluster using a Talend Job.

Scoring a Machine Learning model

Create a new note, by clicking Notebook and selecting Create new note.
Name the note Model_Scoring, then select spark from the Default Interpreter pull-down menu. Click Create Note.
From Hue, upload the NewData.csv file into the /user/admin/Input_Data/ folder.

Go back to the notebook, and read the NewData.csv file from HDFS, store the data in a Spark DataFrame, and filter out the product line, by copying and pasting the following code into the note, then run the paragraph.

val df_data = spark.
   read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").
    option("header", "true").
    option("inferSchema", "true").
    load("/user/admin/Input_Data/NewData.csv")

val df_newdata = df_data.select("GENDER","AGE","MARITAL_STATUS","PROFESSION")

Import the necessary libraries, and load the trained model, using the code below, then run the paragraph.

import org.apache.spark.ml.classification.RandomForestClassifier
import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, IndexToString, VectorAssembler}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.ml.{Model, Pipeline, PipelineStage, PipelineModel}
import org.apache.spark.sql.SparkSession

val model_rf = PipelineModel.read.load("/user/hadoop/Model/MyModel")

Score your new data using the trained model, and export result to HDFS, using the code below, then run the paragraph.

val newprediction = model_rf.transform(df_newdata)
val output_pred = newprediction.select("GENDER","AGE","MARITAL_STATUS","PROFESSION","predictedLabel")
output_pred.coalesce(1).write.mode("overwrite").csv("/user/hadoop/Output_Data/output.csv")

Using File Browser, navigate to /user/Hadoop/Output_Data/output.csv and check the results.
Your model scoring notebook is all set, go to Hue and remove the NewData.csv because you will upload it from S3 to the cluster using a Talend Job.

Talend Studio

Getting started

Before installing Studio make sure that your EC2 can access the EMR cluster, by going to the inbound network rules of your cluster and allowing all traffic from the Studio security group.
To install Studio on an EC2 machine, see the Installing Talend Studio with the Talend Studio Installer page on the Talend Help Center.
When the installation is complete, open Studio, and create a new local project.
Provision an S3 bucket and upload the TrainingData.csv and NewData.csv files.

Creating a Machine Learning training Job

Expand Job Designs, then right-click Standard, and select Create Standard Job.
Name your Job ML_training and click Finish.
On the Repository view, expand Metadata, and right-click Create Hadoop Cluster.
1. Give your cluster a name, then click Next.
2. Select the distribution, Amazon EMR, and version, EMR 5.8.0, of your Hadoop cluster.
3. Select Enter manually Hadoop services.
4. Click Finish.
Enter the connection information manually, using the admin username for authentication. Click Next.
Click Check Services to ensure that Studio can successfully connect to the cluster.
In the designer, add a tPrejob component, a tS3Connection component, and a tHDFSConnection component.
Connect the tPrejob to the tS3Connection using an OnComponentOK trigger, then connect the tS3Connection to tHDFSConnection using an OnSubjobOK. trigger.
Double-click the tS3Connection component, on the Basic settings tab, fill in the Access Key and Secret Key.
Double-click the tHDFSConnection component.
1. On the Basic settings tab, select Repository as the Property Type.
2. To the right of Repository, click the […] button, and navigate to Metadata.
3. Select EMR_HDFS.
4. Click OK.
Add a tS3Get, tHDFSPut, tREST and a tLogRow component to the canvas.
Using OnSubjobOK triggers, connect them as shown below:
Double-click the tS3Get component, on the Basic settings tab, fill in the Bucket, Key, and File fields with the appropriate settings.
Double-click the tHDFSPut component, on the Basic settings tab, configure the settings as shown in the screenshot below:
Before setting up the tRest component, retrieve the note id using the List of the notes API from the Apache Zeppelin web site.
Open up a new tab on your web browser, and using the EMR IP and Zeppelin port of your instance, enter the URL as follows: http://zeppelin-server:zeppelin-port/api/notebook.
Double-click the tRest component, to use the method to run a note, enter the URL, for example, http://zeppelin-server:zeppelin-port/api/notebook/job/NOTE_ID, then select POST for the HTTP Method.
Edit and define the schema, as shown in the screenshot below:
Configure the HTTP Headers, by setting the name to "Content-type", and the value to "application/json".
Run the Job and confirm results.

Creating a Machine Learning scoring Job

Create a new Standard Job, and name it ML_Scoring. Copy and paste the tPreJob, tS3connection, and tHDFSConnection components from the previous Job into this Job.
Add a tS3Get, tHDFSPut, tREST, and a tLogRow component to the canvas and connect them as shown below:
Double-click the tS3Get component, and configure the Basic settings as shown below:
Double-click the tHDFSPut component, and configure the Basic settings as shown below:
Before setting up the tREST component, call the Zeppelin API, http://zeppelin-server:zeppelin-port/api/notebook, to list the notes and find the id related to the Model_Scoring.
Double-click the tRest component, to use the method to run a note, enter the URL, for example, http://zeppelin-server:zeppelin-port/api/notebook/job/NOTE_ID, then select POST for the HTTP Method.
Configure the HTTP Headers, by setting the name to "Content-type", and the value to "application/json".
Run the Job and confirm results.
Confirm your results in Hue, by navigating to /user/Hadoop/Ouput_Data/output.csv.

Conclusion

You’ve learned how to integrate Zeppelin Notebooks with Talend, and you created a data science pipeline with Talend Jobs to train a Machine Learning model and to score new incoming data.

Show Less

0

229

Official Support Articles

How to integrate Talend Jobs containing dynamic queries with Cloudera Navigator ...

Introduction Talend Jobs that are developed using context variable and dynamic SQL queries are not supported; therefore, Talend Data Catalog (TDC)... Show More

Introduction

Talend Jobs that are developed using context variable and dynamic SQL queries are not supported; therefore, Talend Data Catalog (TDC) is unable to harvest metadata and trace data lineage from a Talend dynamic integration Job using the Talend ETL bridge.

This article shows you how to work around these limitations in Talend Jobs that use resources from a Cloudera Cluster using Cloudera Navigator and Talend Data Catalog bridge.

Sources for the project are attached to this article.

Prerequisites

Cloudera Cluster CDH 5.10 and above
Cloudera Navigator 2.15.1 and above
MySQL server to store metadata table of the dynamic integration framework
Talend Big Data Platform 7.1.1 and above
Talend Data Catalog 7.1 Advanced (or Plus) Edition and above, with latest cumulative patches

Setting up Talend Studio

Open Talend Studio and create a new project.
In the Repository, expand Metadata, right-click Hadoop Cluster, then select Create Hadoop Cluster.
Using the Hadoop Cluster Connection wizard, create a connection to your Cloudera Cluster, and make sure that you select the Use cloudera navigator check box.
Click the ellipsis to the right of Use Cloudera Navigator, then set up your connection to Cloudera Navigator, as shown below:

For more information on leveraging Cloudera Navigator in Talend Jobs, see the How to set up data lineage with Cloudera Navigator page of the Talend Big Data Studio User Guide available in the Talend Help Center.

Building the dynamic integration Job

This use case uses MySQL to store metadata such as table source/target, queries, and filters, then stores these values in context variables that are used to build integration Jobs at runtime. The dynamic Job reads data from source tables in Hive and writes data to target tables in Hive.

Upload the metadata for the dynamic integration Job to the MySQL server (or any other DB of your choice), using the Metadata_Demo_forMySql.xlsx file attached to this article.
Upload the source data, located in the employees.csv and salaries.csv files attached in this article, to Hive.
Create a standard Job, then add a tDBConnection component to connect to the metadata database. Note: The complete preparation Job, located in the prepare_load_dwh_Hive.zip file, is attached to this article.
Replicate all of the fields in the metadata table by creating the following Context variables:
Add a tDBInput component.
Connect the tDBConnection component to the tDBInput component using the OnSubjobOk trigger.
Double-click the tDBInput component to open the Basic settings view. Click the [...] button next to the Table name text box, then select the table name where you've uploaded the metadata, in this case, meta_tables, apply the appropriate schema, and use the following query:
```
"SELECT 
  `meta_tables`.`Job`, 
  `meta_tables`.`business_name`, 
  `meta_tables`.`db_in`, 
  `meta_tables`.`tabel_in`, 
  `meta_tables`.`db_out`, 
  `meta_tables`.`table_out`, 
  `meta_tables`.`select_args`, 
  `meta_tables`.`query`, 
  `meta_tables`.`conditions`, 
  `meta_tables`.`db_lookup`, 
  `meta_tables`.`table_lookup`
FROM `meta_tables`
WHERE  `meta_tables`.`business_name`='Agg'"
```
Notice that the value for bussiness_name is hardcoded with Agg. Depending on the type of dynamic query you want to run, you could use a context variable so that at runtime, the Job uses the context variable value and filters the metadata table on the business_name (in this case, Agg or Dwh).
Add a tMap component after the tDBInput component, then connect it using a Main row. The tMap component acts as a pass-through and creates output that contains all the input fields.
Connect the tMap to a tFlowToIterate component, then create a key-value pair for each of the fields in the metadata table.
Add a tRunJob component. Connect the tFlowToIterate component to the tRunJob component using Row > Iterate.
Set up the tRunJob component to transmit the whole context to the child Job for each iteration, as shown below:

Building the data integration Job

In this section, you build a Job that is triggered by the tRunJob component from the previous Job.

Note: The complete integration Job, located in the load_dwh_Hive.zip file, is attached to this article.

Create a new standard Job, then add a tPreJob component and a tHiveConnection component.
Connect tPreJob to tHiveConnection using the OnComponentOK trigger.
Add a tHiveRow component below the tPreJob component.
Configure the tHiveRow component, as shown below:
Use the context parameter transmitted by the parent Job by entering the following query in the Query text box.
```
"INSERT OVERWRITE TABLE  "+context.BB_W_db_out+"."+context.BB_W_table_out+" "+context.BB_W_query+" "
```
The integration Job (child Job) runs as many times as the number of rows returned by the metadata table filtered by the context business_name in the parent Job.
Run the Job.
Open Cloudera Navigator, then search for Hive Jobs. Locate the Talend Job and review trace the data lineage.

Configuring Talend Data Catalog

Open Talend Data Catalog (TDC).
Create a new configuration.

Cloudera Hive bridge

For the bridge to work, you'll need to the Cloudera JDBC connector for Hive and the JDBC driver of your Hive metastore (in this case, Postgres).
Ensure both of the drivers are accessible by TDC server or an Agent.
Create a new Physical Data Model to harvest the Hive metastore.
On the Properties tab, select the Cloudera Enterprise Hadoop Hive Database.
On the Import Setup tab, in the User and Password fields, enter your Hive metastore credentials (typically set up during the cluster installation). If you are not able to retrieve it, use the StackExchange, connect to PostgreSQL server: FATAL: no pg_hba.conf entry for host tutorial for a Hive metastore with Postgres.
Review your settings, when you're finished, the Import setup tab should look like this:

Click Import to start the metadata import.
After a successful import, navigate to Data Catalog > Metadata Explorer > Metadata Browser, and locate the harvested metadata, in this case, employee_male (Table).

Cloudera Navigator bridge

Create a new model, then in Model Type, select Cloudera Navigator - New Beta Bridge.
On the Import Setup tab, fill in the Navigator URL*, Login, Password, then filter the Operations you want to harvest, in this case, hive.
After a successful import, you'll find metadata harvest for Navigator, the Hive (connection), and the dynamic Data Integration Job (Di Model).

Stitching

Make sure that your harvested models belong to your configuration, by dragging them to the configuration you set up earlier.
On the Manage menu, select Manage Contents, click the Nav model, then select the Connection tab on the right.
Select the Connection Name, in this case, Hive.
Click Edit, then, from the Database pull-down list, select Hive (click Edit Schemas if it wasn't done before), then click OK.
Click Build, then click the Diagram to see the connection between models.
Trace the data lineage of the Talend Job containing dynamic queries executed against Hive.

Conclusion

This article showed you how to handle the Talend Data Catalog Harvesting process of DI Jobs using context variables and dynamic queries on a Cloudera Cluster leveraging Navigator and Hive bridges to trace data lineage.

Show Less

0

1012

Official Support Articles

Install, configure, and automate Talend 7 Continuous Integration with Jenkins

Overview Talend Continuous Integration (CI) is now fully compliant with Maven standards, and Continuous Integration and Deployment (CI/CD) with Ta... Show More

Overview

Talend Continuous Integration (CI) is now fully compliant with Maven standards, and Continuous Integration and Deployment (CI/CD) with Talend has never been easier.

This tutorial illustrates how to automate CI/CD with Talend 7 and Jenkins, by complementing current Talend documentation. For more information, see the Talend Software Development Life Cycle Best Practices Guide.

Requirements

Talend platform Edition 7.0.1
Jenkins
Source code management supported by Talend

Architecture

This example uses a Continuous Integration server (Jenkins) by leveraging Talend CI Builder, Bitbucket as a service for the code repository, and the Talend Nexus Artifact Repository.

You can continue to use TAC to publish Jobs to a Nexus repository using the Publisher page, but the Publisher page is deprecated in TAC, so Talend recommends using a Maven build.

Talend CI Builder is a Maven plugin, delivered by Talend, that transforms the Talend Job sources to Java classes using the Talend CommandLine application. This allows you to execute your tests in your own company Java factory.

The overall high-level architecture:
Talend platform architecture and components on a Windows Server machine:
CI server architecture and components on a Red Hat machine:

Installing the Talend platform

Install TAC and CommandLine

Download the Talend-Installer, the Dist file, and your License.
Extract and store them in the same folder.
Double-click the installer.
Choose Advanced Install, choose Custom, then browse to your License File.
Select Talend Administration Center and Talend Command Line. Click Next.
Select Talend Administration Center, Talend Command Line, and Talend Server Services. Click Next.
Choose Install an embedded tomcat8 server on the drop-down list. Keep the default Create TAC administrator user setting. Click Next.
Choose Embedded H2 database on the drop-down list and select Install Nexus server with TAC.
Keep the default Nexus Port.
Keep the default CommandLine port.
Select Install Talend Administration Center as a service and Install Talend Command Line as a service.
The Talend platform will install.

Set up Bitbucket as a service

Create a Bitbucket account or use your existing one.
Click the plus [+] sign.
From the CREATE menu, select Repository.
Create a new repository and give it a name. Select Git as the version control system, then click Create repository.

Set up Talend Administration Center

Go to the TAC web interface.
Login with security@company.com and your password.
Create a new user. Complete your Git login with your Bitbucket login, click Validate, then click Save.
Log out, then log back in as the user you just created.
Create a new project.
Navigate to your Bitbucket repository and find the Git URL.
Complete your project details. Check the connection to your Bitbucket repository, if it is OK, click Save.
Grant your user write privileges on the newly created project.

Set up Talend Nexus

Open the Nexus web interface.
Login with the user admin and password Talend123.
Navigate to Server Administration and configuration, click Repositories, then select Create repository.
Create a new repository, choose maven2 (hosted) and configure it, as shown below. Use this repository to deploy snapshot versions of your Talend Job.
Create a second repository, choose maven2 (hosted) and configure it, as shown below. Use this repository to deploy release version of your Talend Job.
Create a third repository, choose maven2 (hosted) and configure it, as shown below. This repository is used by Talend CI Builder and is defined later in the maven_user_settings.xml file.
When the installation is complete, open Studio.

Installing Talend Studio

Install Studio and connect to TAC

On another machine, install Talend Studio using the installer.
When the installation is complete, open Studio.
Click Manage Connection, click the green plus [+] sign to create a Remote TAC connection. Enter the User name, User Password, and the Web-app Url. Click Check url to make sure the connection is working.
Select the connection you created. Click Finish.

Create a sample Job

Create a simple Job that reads two flat files and uses a tMap component to join the files.

Add two tFileInputDelimited components to the Designer.
Drag and drop a tMap component and connect your input files to the tMap. Use one as a source table and the other as a lookup table.
Use an ID common for both files and join them using the tMap interface.
Drag and drop a tFileOutputDelimited component and connect it at the output of the tMap component to store the results of the join into a file.
Run your Job and make sure that it runs successfully.

Create test cases

Right-click the tMap component within the Job, then select Create Test Case.
Give it a name, then click Finish.

A test case Job has been generated automatically for you.
As the note in Step 2 indicates, you need sample data to complete the step. Create another Job with the files you used in the tFileInputDelimited components, but only extract 200 rows by using a tSampleRow component.
Create another Job with the sample files as input, connect the same tMap component as before, and connect a tOutputfiledelimited component at the output to get a sample output for your test cases.
After the sample files are generated, double-click the testcase. Select the TestCase tab and open the Default test case.
For each input_file and reference_file, click File Browse and select each previously generated sample file.
Right-click the Default test, select Run Instance, and make sure that your test runs successfully.

Your test cases are now set up, and you can automate them within the Jenkins pipeline.
Before going to the next steps, make sure you have saved your Job and pushed the modification to your Bitbucket repository.

Configuring the CI server

Install Maven

Login to the Red Hat machine.
From the Apache Maven Project web page, download Maven.
Extract the ZIP archive to the directory where you want to install Maven.
Open a terminal and add the M2_HOME environment variable by entering the following command:
```
export M2_HOME=/opt/apache-maven/apache-maven-3.0.x
```
Add the M2 environment variable by entering the following command:
```
export M2=$M2_HOME/bin
```
Add the M2 environment variable to your path by entering the following command:
```
export PATH=$M2:$PATH
```
Make sure that JAVA_HOME is set to the location of your JDK.
Verify that Maven is installed successfully on your machine, by running the command:
```
mvn --version
```

Install Git

Ensure that your system is up-to-date with the latest version of packages by running the YUM package manager update command, as shown below:
```
 yum update
```
Install Git by running the command:
```
 yum install git 
```
Verify that Git is installed successfully on your machine, by running the command:
```
git --version
```

Install and configure Talend CommandLine

Ensure that the dist file is in the same folder as the Talend-Tools-Installer-YYYYYYYY_YYYY-VA.B.C-linux64-installer.run file.
Make the Talend-Tools-Installer-YYYYYYYY_YYYY-VA.B.C-linux64-installer.run file executable, using the following command. If you want to install Talend server modules as services, execute this command with the super-user rights.
```
chmod +x Talend-Tools-Installer-YYYYYYYY_YYYY-VA.B.C-linux64-installer.run
```

Launch Talend Installer by running the command:

./Talend-Tools-Installer-YYYYYYYY_YYYY-VA.B.C-linux64-installer.run

Accept the License Agreement, and choose the directory where you want your Talend product to be installed.
Choose Advanced Install from the installation style list, and Custom from the installation type list.
Add your license file and launch the installation.
Install only Talend CommandLine with default port 8002.
Start Talend CommandLine at least once to initialize its default Maven repository, then close it.

Edit the commandlinePath/configuration/maven_user_settings.xml file and add the connection information to the Nexus repositories. In your case, Nexus is on a remote server where you installed the Talend platform, so replace localhost with the private EC2 IP address.

<?xml version="1.0" encoding="UTF-8"?>
<settings xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd" xmlns="http://maven.apache.org/SETTINGS/1.1.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <localRepository>/<.m2Path>/repository</localRepository>

<servers>
    <!-- credentials to access the default releases/snapshots repositories -->
    <server>
        <id>releases</id>
        <username>admin</username>
        <password>Talend123</password>
    </server>
    <server>
        <id>snapshots</id>
        <username>admin</username>
        <password>Talend123</password>
    </server>
    <!-- credentials to access the repositories holding external jars -->  
    <server>
        <id>talend-custom-libs-release</id>
        <username>admin</username>
        <password>Talend123</password>
    </server>
    <server>
        <id>talend-custom-libs-snapshot</id>
        <username>admin</username>
        <password>Talend123</password>
    </server>
    
    <!-- credentials to access the repositories holding maven plugins -->
    <server> <!-- central (as proxy) -->
        <id>central</id>
        <username>admin</username>
        <password>Talend123</password>
    </server>
    <server>
        <id>thirdparty</id>
        <username>admin</username>
        <password>Talend123</password>
    </server>
    
</servers>

<mirrors/>
<proxies/> <!-- http proxies, not maven proxy repositories -->

<profiles>
    <profile>
        <id>talend-ci</id>
        <repositories>
            <repository>
                <id>central</id>
                <name>central</name>
                <url>http://localhost:8081/repository/maven-central/</url>
                <layout>default</layout>
            </repository>
            <repository>
                <id>talend-custom-libs-release</id>
                <name>talend-custom-libs-release</name>
                <url>http://localhost:8081/repository/talend-custom-libs-release</url>
                <layout>default</layout>
                <releases>
                    <enabled>true</enabled>
                </releases>
                <snapshots>
                    <enabled>false</enabled>
                </snapshots>
            </repository>
            <repository>
                <id>talend-custom-libs-snapshot</id>
                <name>talend-custom-libs-snapshot</name>
                <url>http://localhost:8081/repository/talend-custom-libs-snapshot</url>
                <layout>default</layout>
                <releases>
                    <enabled>false</enabled>
                </releases>
                <snapshots>
                    <enabled>true</enabled>
                </snapshots>
            </repository>
        </repositories>
        
        <pluginRepositories>            
            <pluginRepository>
                <id>central</id>
                <name>central</name>
                <url>http://localhost:8081/repository/maven-central/</url>
                <layout>default</layout>
            </pluginRepository>
            <pluginRepository>
                <id>thirdparty</id>
                <name>thirdparty</name>
                <url>http://localhost:8081/repository/thirdparty</url>
                <layout>default</layout>
            </pluginRepository>
        </pluginRepositories>
    </profile>
</profiles>

<activeProfiles>
    <activeProfile>talend-ci</activeProfile>
</activeProfiles>

</settings>

Install CI Builder

Extract the Talend-CI-Builder-V7.0.1.zip archive file in the directory of your choice.

Browse to the installation directory and execute the following command:

mvn install:install-file -Dfile=ci.builder-7.0.1.jar -DpomFile=ci.builder-7.0.1.pom

Browse to the CI Builder installation directory and execute the following command to deploy the new repository on Nexus:
```
mvn deploy:deploy-file -Dfile=ci.builder-7.0.1.jar -DpomFile=ci.builder-7.0.1.pom -DrepositoryId=thirdparty -Durl=http://127.0.0.1:8081/repository/talend-custom-libs-release/ -s <commandlinePath>/configuration/maven_user_settings.xml
```
where the -Durl parameter value corresponds to your repository URL on Nexus, and the -s parameter value corresponds to the path to your maven_user_settings.xml file.
Log back in to the Nexus web UI, click Browse, then navigate to the thirdparty repository.
This Maven plugin is now available for anyone and can be incorporated in your builds.

Install Jenkins

From the Jenkins web page, download Jenkins.
Download the appropriate version for your environment.
Before starting the installation, verify that you have installed a Java JDK, and set your JAVA_HOME and PATH, then follow the Jenkins documentation on Installing Jenkins on Red Hat distributions.

Install Jenkins plugins

After installing Jenkins with the default setup, log in and install additional plugins.
Click Manage Jenkins then select Manage Plugins.
Click the Available tab.
Select and install the following plugins: Bitbucket, GitLab, Pipeline, Build Pipeline, Green Balls, Publish Over SSH, SSH, and Workspace Cleanup.

Configuring Jenkins

Navigate to the Jenkins main page.
Click Manage Jenkins, then Global Tool Configuration.
Click JDK > JDK installations, fill in Name and JAVA_HOME, as it is set up on your CI server, then Save.
Click Git, fill in the Name and Path to Git executable where Git is installed on your CI server, then Save.
Click Maven installations, fill in the Name and MAVEN_HOME of your CI server, then Save.
You are all set to start building a Jenkins CI Pipeline. You will specify your Bitbucket credential within your Jenkins Jobs.

Create a Jenkins CI Pipeline

Add a Bitbucket webhook

Login to Bitbucket.
Select your repository.
Click Settings.
From WORKFLOW select Webhooks.
Click Add webhook, specify a title, and build the URL as follows: http://PUBLIC IP of JENKINS:8080/bitbucket-hook/. Configure the remaining options as shown below, then Save.
To the right of your webhook, click View requests.
Go back into Studio and make a small change to your Job, for example, change the name of a component, then push the change to Git, come back to the Bitbucket webhooks request, and make sure that the webhook is working.

Compile

Go to Jenkins main page and select New Item.
Enter the Job name: 01_BitBucket_Compile, select Maven Project, click OK.
Scroll down to Source Code Management, select Git, enter your Repository URL, add your credentials, and specify the branch, for example, */master.
Scroll down to Build Trigger, then select Build when a change is pushed to BitBucket.
Scroll down to Build Environment, then select Delete workspaces before starts.
Scroll down to Build, and complete the first section as shown below: add the Root POM of your Talend project, define your Maven Goals and options, and point to your cmdline instance for MAVEN_OPTS.
Complete the section as shown below: locate your maven_user_settings.xml file.
Click Save. Your first Jenkins Job is complete.
To run it manually, click the green arrow with a clock, on the right side of the screen.
Click the console output to see the results, and you should end up with the status: Finished: SUCCESS.
As you did earlier, make a small change to your Job from the Studio, push the modification to Git, and make sure that the compile of the Jenkins Job has been triggered.

Test

Create a new Maven project, name it 02_Test, scroll down to Build Triggers, configure the project as shown below:
Set up the Build Environment:
Configure Build as shown below, then click Save:

Package

Create a new Maven project, name it 03_Package, scroll down to Build Triggers, and configure the project as shown below:
Set up the Build Environment:
Configure Build as shown below, then click Save:

Install

Create a new Maven project, name it 04_Install, scroll down to Build Triggers, and configure the project as shown below:
Set up the Build Environment:
Configure Build as shown below, then click Save:
Create a new Maven project, name it 05_Deploy, scroll down to Build Triggers, and configure the project as shown below:
Set up the Build Environment:

Configure Build as shown below, then click Save:

MAVEN_OPTS:

-Dproduct.path=/opt/cmdline
-Dgeneration.type=local
-DaltDeploymentRepository=snapshots::default::http://YOUR NEXUS PUBLIC IP:8081/repository/ben_snapshots/

Create a new Pipeline view from the Jenkins main page by clicking the plus [+] sign next to the tabs.
Give it a name, Pipeline View, select Build Pipeline View, then OK.
Specify the Pipeline Flow, as shown below. Keep the rest of the default settings. Click OK.
Run the first job manually, or make a small change to your Talend Job, and push the update to Bitbucket. Click the Pipeline view you just created.
Make sure that your pipeline ran successfully, and check your repository to confirm that your Talend Job has deployed.

Show Less

0

2428

Official Support Articles

'Problem with connection to Integration Cloud' error when testing the connectio...

Problem Description Testing the Integration Cloud connection in Studio (Preferences > Talend > Integration Cloud) fails with the following error:... Show More

Problem Description

Testing the Integration Cloud connection in Studio (Preferences > Talend > Integration Cloud) fails with the following error:

Root Cause

This is a JDK bug: https://bugs.openjdk.java.net/browse/JDK-8144566

Solution

Use a JDK 1.8 update > 151 in Talend Studio. For more information, see the Where to define JAVA_HOME for Studio / TAC / JobServer / CommandLine article, in the Talend Community Knowledge Base (KB).

Warnings

The JRE installed by default with Talend Studio installer may contain the JDK-8144566 bug. Check the JRE/JDK version and update it if necessary.

Show Less

0

238

Official Support Articles

Configuring SVN when proxy is involved

Talend Version 6.1.1 Summary Configuring SVN when proxy is involved.Additional Versions6.2.1Key wordssvn proxy tacProductTalend Data Integr... Show More

Talend Version	6.1.1
Summary	Configuring SVN when proxy is involved.
Additional Versions	6.2.1
Key words	svn proxy tac
Product	Talend Data Integration
Component	Talend Administration Center
Article Type	Configuration
Problem Description	Customer couldn't connect to a SVN server from the TAC ("URL cannot be reached") although he could connect from the same box outside Talend using wget.
Problem root cause	Mixing up the -Dhttp.proxyNNNN and -Dhttps.proxyNNNN JVM parameters. Initially only HTTPS, and then both HTTP and HTTPS.
Solution or Workaround	Even if you refer to HTTPS in the TAC > Settings > Configuration > Monitoring > AMC URL, for example https://host_name:9443/svn/talend661/, the JVM parameters have to be HTTP, NOT HTTPS: -Dhttp.proxySet=true -Dhttp.proxyHost=proxy_hostname -Dhttp.proxyPort=port_number -Dhttp.nonProxyHosts=domain_one\|domain two... If the above settings don't work, and your URL runs over HTTPS, then try using the HTTPS-based Java properties as well: -Dhttps.proxySet=true -Dhttps.proxyHost=proxy_hostname -Dhttps.proxyPort=port_number Somewhat oddly, the *-Dhttps.nonProxyHosts=domain_one\|domain two...* property does not exist - the http.nonProxHosts setting will be used for both HTTP and HTTPS URLs. To read more about these Java networking/proxy settings, see: https://docs.oracle.com/javase/8/docs/technotes/guides/net/proxies.html To add the parameters to a Windows install, modify the setenv.bat file: REM you will see the original line looks like this; this line stays set "JAVA_OPTS=%JAVA_OPTS% -Xmx2048m -Dfile.encoding=UTF-8" REM then just add in the proxy parameters set "JAVA_OPTS=%JAVA_OPTS% -Dhttp.proxyHost=<proxy_hostname> -Dhttp.proxyPort=<port_number> etc..." To add the parameters to a Linux / Unix install, modify the setenv.sh file (notice the $ instead of %s😞 # you will see the original line looks like this; this line stays export JAVA_OPTS="$JAVA_OPTS -Xmx2048m -Dfile.encoding=UTF-8" # then just add a new line that adds the proxy parameters export JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=<proxy_hostname> -Dhttp.proxyPort=<port_number> etc..." Then restart TAC. Windows: C:\Talend\6.4.1\tac\stop_tac.bat; C:\Talend\6.4.1\tac\start_tac.bat; Linux / Unix: C:\Talend\6.4.1\tac\stop_tac.sh; C:\Talend\6.4.1\tac\start_tac.sh;
References	How to add proxy settings to TAC

Show Less

0

441

Official Support Articles

Causes of the "UnsupportedClassVersionError" exception

Symptoms This error can occur when you execute a Job script outside of Talend Studio, and the JVM used to execute the Job is different from the J... Show More

Symptoms

This error can occur when you execute a Job script outside of Talend Studio, and the JVM used to execute the Job is different from the JVM used to compile the Job.

java.lang.UnsupportedClassVersionError

If you don't know which JVM the machine has, you can execute the following command at the command prompt:

java -version

In the following Windows example, the java version is "1.6.0_11".

C:\Documents and Settings\Administrator>java -version
java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)

Procedure

This problem can be fixed by rebuilding the Job with the same JVM version as the one on the machine where the Job is executed. Follow these steps:

Execute the command to check the JDK version in the execution environment:
```
java -version
```
Refer to this page to rebuild the Job with the correct JVM version.
Export the Job script again and move it to the target system or re-generate the Job and deploy it to the Job server in Talend Administrator Center.

Show Less

0

311

Official Support Articles

MDM migration from 6.2.1 to 6.4.1 is slow

Talend Version6.4.1 Summary MDM migration from 6.2.1 to 6.4.1 is slowAdditional Versions ProductTalend MDMComponentMigrationProblem DescriptionMigr... Show More

Talend Version	6.4.1
Summary	MDM migration from 6.2.1 to 6.4.1 is slow
Additional Versions
Product	Talend MDM
Component	Migration
Problem Description	Migration to 6.4.1 is taking considerable time.
Problem root cause	amaltoOBJECTSCompletedRoutingOrderV2 and UpdateReport are taking the majority of the time for migration, as seen in the logs.
Solution or Workaround	To improve migration speed, perform the following: Disable Event Manager when migrating UpdateReport data. On the Target server: Stop the MDM service if running. Edit the mdm.conf file and set subscription.engine.autostart=false. Edit the log4j.xml file and add the following log category: <category name="com.amalto.core.server.routing"> <priority value="FATAL" /> </category> Restart the MDM server. To improve the performance for system containers, use the interactive mode to skip migration of unwanted system containers, like amaltoOBJECTSCompletedRoutingOrderV2. For more information on migration methodology, see the two part series: MDM Version Upgrade Methodology Part 1 MDM Version Upgrade Methodology Part 2
JIRA ticket number

Show Less

0

167

Official Support Articles

Which encoding does tMomOutput use?

Question If not specified, which encoding/code-page does tMomOutput use? Answer tMomOutput will use the default encoding of the machin... Show More

Question

If not specified, which encoding/code-page does tMomOutput use?

Answer

tMomOutput will use the default encoding of the machine/locale where the Job is running.

If you don't want to use the default encoding, then you can overwrite it as follows:

Select the Advanced settings tab of tMomOutput.
Select the Set MQMD Fields checkbox.
Click the green + sign to add a field.
Click the Field Name column to select characterSet.
Set the Field value of the encoding value you want.

Show Less

0

217

Official Support Articles

Can I install only the new components rather than the entire release?

Answer While it is possible for you to download only specific components in a new release from the SVN, Talend strongly recommends that you insta... Show More

Answer

While it is possible for you to download only specific components in a new release from the SVN, Talend strongly recommends that you install the complete release in order to take advantage of new functionality and application improvements. Once installed, you can import your existing projects.

If you do decide to install only specific components, you are likely to miss architectural dependencies that would cause issues with your application.

Again, Talend strongly advises you to install the entire new version of an application rather than only part of it.

Show Less

0

169

Official Support Articles

What is the difference between Built-In and Repository?

Answer Built-in: all information is stored locally in the Job. You can enter and edit all information manually. Repository: all information is ... Show More

Answer

Built-in: all information is stored locally in the Job. You can enter and edit all information manually.

Repository: all information is stored in the repository.

You can import read-only information into the Job from the repository. If you want to modify the information, you must take one of the following actions:

Convert the information from Repository to Built-in and then edit the built-in information.
Modify the information in the Repository. Once you have made the changes, you are prompted to update the changes into the Job.

Which is better?

It depends on the way you use the information.

Use Built-In for information that you use only once or very rarely.
Use Repository for information that you want to use repeatedly in multiple components or Jobs, such as a database connection.

Show Less

0

764

Official Support Articles

Tracing records with breakpoints

Overview Talend Studio is an IDE based on Eclipse RCP. It provides a proprietary record trace debugger and allows you to run Talend Jobs in Trace... Show More

Overview

Talend Studio is an IDE based on Eclipse RCP. It provides a proprietary record trace debugger and allows you to run Talend Jobs in Trace mode and in Debug mode, to set a breakpoint on a data flow, and to trace records.

Environment

This procedure is compatible with all versions of Talend Data Integration (subscription only).

Procedure

Create an example Job

Create an example Job called TraceRecordsWithBreakpoint. Use a tFixedFlowInput to generate some source data such as:

1;Shong
2;Elise
3;Dave
4;Mike
5;Pedro

The detailed Job settings are shown in the following figure:

Set a breakpoint

To set a breakpoint on the data flow, proceed as follows:

Right-click the connector between two components and select Show Breakpoint Setup.

Note: This feature is available only in Talend Data Integration (on subscription only).
In the Breakpoint tab, check Active conditional breakpoint and/or Use advanced mode to set a breakpoint. In this example, if you check Active conditional breakpoint and set a breakpoint in the Condition table, the Job will pause when the value of the Name column equals Mike.

Trace records with breakpoint

Now run the Job in Traces Debug mode and trace records. Follow these steps:

In the Run view, click Debug Run tab, select Traces Debug from the Debug list, and click Traces Debug to run the Job.
As the figure shows, the Job pauses when the value of the Name column equals Mike, when it matches the condition of the breakpoint.

Result

Now you can trace the records by clicking Previous, Next, and Breakpoint.

Previous: return back to the previous record.
Next: go to the next record.
Breakpoint: continue to run and pause until next breakpoint.
Basic Run: continue to run until ends.
Kill: kill the Job.

Show Less

0

417

Official Support Articles

How can I display Karaf wrapper logs in milliseconds?

Talend Version6.1.1, 6.2.1, 6.3.1, 6.4.1 Summary How can I customize my Karaf wrapper log format?Additional Versions ProductTalend ESBComponentRunt... Show More

Talend Version	6.1.1, 6.2.1, 6.3.1, 6.4.1
Summary	How can I customize my Karaf wrapper log format?
Additional Versions
Product	Talend ESB
Component	Runtime
Problem Description	I need to display Karaf wrapper logs in milliseconds.
Problem root cause
Solution or Workaround	In Runtime/container/etc/karaf-wrapper.conf, modify the value of the wrapper.logfile.format property from LPTM to LPZM: wrapper.logfile.format=LPZM See Wrapper Logging Properties for an explanation of what each token represents.
JIRA ticket number

Show Less

0

136

Official Support Articles

A Job using a tSqoopImport component hangs during the Job run

Problem Description A Job is designed with a tSqoopImport component to import data from Oracle to HDFS. The Job hangs without errors or warnings.... Show More

Problem Description

A Job is designed with a tSqoopImport component to import data from Oracle to HDFS. The Job hangs without errors or warnings.

Root Cause

Running the jstack utility on the Job process to collect the stack trace, shows that the read call of Oracle is active all the time:

"main" prio=6 tid=0x0000000001375000 nid=0x9ac4 runnable [0x00000000012ce000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at oracle.net.ns.Packet.receive(Packet.java:311)
at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:426)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:390)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:249)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:566)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:202)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:45)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:766)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:897)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1034)
at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1244)
- locked <0x00000000ef836ca0> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:420)

Solution

The Job is waiting on an Oracle read call. This is an Oracle database performance issue and is not related to Talend, contact your Database Administrator for assistance.

Note:

To further isolate the issue with database and Talend server components, perform the following tests:

Use the telnet, ping, and traceroute commands from the Talend Job server to the database server, and verify that the communication with the database is healthy and remove any latency issues.
Verify that there are no firewall issues. There could be idle sockets established by JDBC connections to the database, which could lead to the socket used by the JDBC driver not closing.
Check the blocking sessions at the database level by using a v$session table. The following query returns a list of active blocking sessions and the sessions that they are blocking:
```
select blocking_session,sid,serial#, wait_class,seconds_in_wait from v$session where blocking_session is not NULL order by blocking_session;
```
Reproduce this outside of Talend by running SQL queries on the Job server machine. You can run the queries using the Oracle sqlplus utility.

Show Less

0

221

Official Support Articles

Official Support Articles

Recent Documents

Question

Answer

Overview

Environment

Procedure

Create an example Job

Export the Job script

Execute the Job

Problem Description

Root Cause

Solution

Question

Answer

Overview

Assumptions

Prerequisite

Apache Zeppelin

Getting started with Zeppelin on EMR

Training a Machine Learning model

Scoring a Machine Learning model

Talend Studio

Getting started

Creating a Machine Learning training Job

Creating a Machine Learning scoring Job

Conclusion

Introduction

Prerequisites

Setting up Talend Studio

Building the dynamic integration Job

Building the data integration Job

Configuring Talend Data Catalog

Cloudera Hive bridge

Cloudera Navigator bridge

Stitching

Conclusion

Overview

Requirements

Architecture

Installing the Talend platform

Install TAC and CommandLine

Set up Bitbucket as a service

Set up Talend Administration Center

Set up Talend Nexus

Installing Talend Studio

Install Studio and connect to TAC

Create a sample Job

Create test cases

Configuring the CI server

Install Maven

Install Git

Install and configure Talend CommandLine

Install CI Builder

Install Jenkins

Install Jenkins plugins

Configuring Jenkins

Create a Jenkins CI Pipeline

Add a Bitbucket webhook

Compile

Test

Package

Install

Problem Description

Root Cause

Solution

Warnings

Symptoms

Procedure

Question

Answer

Answer

Answer

Overview

Environment

Procedure

Create an example Job

Set a breakpoint

Trace records with breakpoint