Where are the tHashInput and tHashOutput components?

Talend Version 7.2.1, 7.3.1, 8.0.1 Summary Can't find the tHashInput and tHashOutput components in Studio's Palette.Additional Versions... Show More

Talend Version	7.2.1, 7.3.1, 8.0.1
Summary	Can't find the tHashInput and tHashOutput components in Studio's Palette.
Additional Versions
Key words	tHashInput tHashOutput Studio Palette
Product	Talend Data Integration
Component
Article Type
Problem Description	tHashInput and tHashOutput components (and some other components) are not listed in Studio's Palette.
Problem root cause	By default, the technical components tHashInput and tHashOutput (and some other components) are not visible in the Palette. This is expected and it is a design behavior as described in the Talend Studio Component Reference Guide, Technical Components section.
Solution or Workaround	Here is how you can add them to the palette: From the Studio File menu, select Edit Project properties. Expand Designer, then click Palette Settings. Expand Technical then select both of the components in question. Click the arrow icon , then click OK. For more information, see Where can I find the tHashInput tHashOutput components
JIRA ticket number

Show Less

0

683

Official Support Articles

Exporting a Job script and executing it outside of Talend Studio

Overview Talend Jobs support cross-platform execution. You can develop your Job on one machine, export the Job script, and then move it to another m... Show More

Overview

Talend Jobs support cross-platform execution. You can develop your Job on one machine, export the Job script, and then move it to another machine to execute without any additional configuration except the JDK installation. This article explains how to export the Job script and execute it outside of Talend Studio.

Environment

This procedure was written with:

Talend Open Studio for Data Integration 8.0.1.20211109_1610
JDK version: Oracle JDK build 1.8.0_333
Operating system: Windows 10

Talend verified this procedure to be compatible with Talend Open Studio for Data Integration starting from version 4.2.3.

Starting from version 6.0, Talend Studio requires a JDK installation to build jobs completely. For more information, refer to Requiring a JDK installation to build jobs starting from version 6.0.

Procedure

Create an example Job

Create a Job called ExportDemo. This Job generates the current timestamp and appends it to a file (for example, D:/file/out.txt). The detailed Job design is as follows:
In the tFileOutputDelimited component, check the Append box to append the current timestamp to an existing file whenever the job is executed.
Execute the Job to ensure it works in Talend Studio. Then open the file D:/file/out.txt and verify that the current timestamp was written to the file. For example, the file has a new record as follows:
```
10/5/2023  2:32:53 PM
```

Export the Job script

To export the Job script follow these steps:

Right-click the Job name in the Repository view. Select Build Job (or Export Job prior to version 5.4.0) to export the Job script.
Browse to the location where you exported the Job script. Select the Standalone Job item in the Build type list, then click Finish.

Execute the Job

Copy the zip file to another machine if necessary. Unzip the zip file.
Open the folder where the executable files (jobName_run.bat/jobName_run .sh) are located.

For example: D:\file\ExportDemo_0.1\ExportDemo.
Execute the Job: in this example, by clicking the ExportDemo_run.bat file on a Windows system, or by executing the ExportDemo_run.sh file on a Unix/Linux system.
Open the file D:/file/out.txt and verify that the current timestamp was appended.

Show Less

0

998

Official Support Articles

How to generate a trace for the HTTP requests executed by Studio

Question How do you generate a trace of the HTTP requests executed by the Talend Studio without using a third-party tool? Answer 1. Create a lo... Show More

Question

How do you generate a trace of the HTTP requests executed by the Talend Studio without using a third-party tool?

Answer

1. Create a logging.properties file (in the c:\temp folder for example) containing the following lines:

.level=FINEST
handlers = java.util.logging.FileHandler
java.util.logging.FileHandler.pattern = c:/temp/debug.txt
java.util.logging.FileHandler.limit = 20480KB
java.util.logging.FileHandler.count = 10
java.util.logging.FileHandler.append = true
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter

2. Create a log4j.properties file (in the c:\temp folder for example) containing the following lines:

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
log4j.logger.org.apache.http=DEBUG
log4j.logger.org.apache.http.wire=DEBUG

Note: the trace generated does not contain all the body of the HTTP requests. If you want to log the body of the HTTP requests and responses, use a HTTP proxy, such as Fiddler.

3. Start Talend Studio by using the following commands:

On Windows : from a CMD window , execute the following 2 commands :

set _JAVA_OPTIONS=-Dlog4j.debug -Dlog4j.configuration=file:"c:\temp\log4j.properties" -Djava.util.logging.config.file=c:\temp\logging.properties
Talend-Studio-win-x86_64.exe --talendDebug > studio_debug.txt 2>&1

On Linux : from a shell terminal , execute the following 2 commands :

export _JAVA_OPTIONS='-Dlog4j.debug -Dlog4j.configuration=file:"/tmp/log4j.properties" -Djava.util.logging.config.file=/tmp/logging.properties'
Talend-Studio-linux-gtk-x86_64 --talendDebug > studio_debug.txt 2>&1

NOTE : On Linux , "adapt" the file logging.properties :

java.util.logging.FileHandler.pattern = /tmp/debug.txt

Collect the 2 files : studio_debug.txt and debug.txt

Show Less

0

513

Official Support Articles

How to integrate Talend Jobs containing dynamic queries with Cloudera Navigator ...

Introduction Talend Jobs that are developed using context variable and dynamic SQL queries are not supported; therefore, Talend Data Catalog (TDC)... Show More

Introduction

Talend Jobs that are developed using context variable and dynamic SQL queries are not supported; therefore, Talend Data Catalog (TDC) is unable to harvest metadata and trace data lineage from a Talend dynamic integration Job using the Talend ETL bridge.

This article shows you how to work around these limitations in Talend Jobs that use resources from a Cloudera Cluster using Cloudera Navigator and Talend Data Catalog bridge.

Sources for the project are attached to this article.

Prerequisites

Cloudera Cluster CDH 5.10 and above
Cloudera Navigator 2.15.1 and above
MySQL server to store metadata table of the dynamic integration framework
Talend Big Data Platform 7.1.1 and above
Talend Data Catalog 7.1 Advanced (or Plus) Edition and above, with latest cumulative patches

Setting up Talend Studio

Open Talend Studio and create a new project.
In the Repository, expand Metadata, right-click Hadoop Cluster, then select Create Hadoop Cluster.
Using the Hadoop Cluster Connection wizard, create a connection to your Cloudera Cluster, and make sure that you select the Use cloudera navigator check box.
Click the ellipsis to the right of Use Cloudera Navigator, then set up your connection to Cloudera Navigator, as shown below:

For more information on leveraging Cloudera Navigator in Talend Jobs, see the How to set up data lineage with Cloudera Navigator page of the Talend Big Data Studio User Guide available in the Talend Help Center.

Building the dynamic integration Job

This use case uses MySQL to store metadata such as table source/target, queries, and filters, then stores these values in context variables that are used to build integration Jobs at runtime. The dynamic Job reads data from source tables in Hive and writes data to target tables in Hive.

Upload the metadata for the dynamic integration Job to the MySQL server (or any other DB of your choice), using the Metadata_Demo_forMySql.xlsx file attached to this article.
Upload the source data, located in the employees.csv and salaries.csv files attached in this article, to Hive.
Create a standard Job, then add a tDBConnection component to connect to the metadata database. Note: The complete preparation Job, located in the prepare_load_dwh_Hive.zip file, is attached to this article.
Replicate all of the fields in the metadata table by creating the following Context variables:
Add a tDBInput component.
Connect the tDBConnection component to the tDBInput component using the OnSubjobOk trigger.
Double-click the tDBInput component to open the Basic settings view. Click the [...] button next to the Table name text box, then select the table name where you've uploaded the metadata, in this case, meta_tables, apply the appropriate schema, and use the following query:
```
"SELECT 
  `meta_tables`.`Job`, 
  `meta_tables`.`business_name`, 
  `meta_tables`.`db_in`, 
  `meta_tables`.`tabel_in`, 
  `meta_tables`.`db_out`, 
  `meta_tables`.`table_out`, 
  `meta_tables`.`select_args`, 
  `meta_tables`.`query`, 
  `meta_tables`.`conditions`, 
  `meta_tables`.`db_lookup`, 
  `meta_tables`.`table_lookup`
FROM `meta_tables`
WHERE  `meta_tables`.`business_name`='Agg'"
```
Notice that the value for bussiness_name is hardcoded with Agg. Depending on the type of dynamic query you want to run, you could use a context variable so that at runtime, the Job uses the context variable value and filters the metadata table on the business_name (in this case, Agg or Dwh).
Add a tMap component after the tDBInput component, then connect it using a Main row. The tMap component acts as a pass-through and creates output that contains all the input fields.
Connect the tMap to a tFlowToIterate component, then create a key-value pair for each of the fields in the metadata table.
Add a tRunJob component. Connect the tFlowToIterate component to the tRunJob component using Row > Iterate.
Set up the tRunJob component to transmit the whole context to the child Job for each iteration, as shown below:

Building the data integration Job

In this section, you build a Job that is triggered by the tRunJob component from the previous Job.

Note: The complete integration Job, located in the load_dwh_Hive.zip file, is attached to this article.

Create a new standard Job, then add a tPreJob component and a tHiveConnection component.
Connect tPreJob to tHiveConnection using the OnComponentOK trigger.
Add a tHiveRow component below the tPreJob component.
Configure the tHiveRow component, as shown below:
Use the context parameter transmitted by the parent Job by entering the following query in the Query text box.
```
"INSERT OVERWRITE TABLE  "+context.BB_W_db_out+"."+context.BB_W_table_out+" "+context.BB_W_query+" "
```
The integration Job (child Job) runs as many times as the number of rows returned by the metadata table filtered by the context business_name in the parent Job.
Run the Job.
Open Cloudera Navigator, then search for Hive Jobs. Locate the Talend Job and review trace the data lineage.

Configuring Talend Data Catalog

Open Talend Data Catalog (TDC).
Create a new configuration.

Cloudera Hive bridge

For the bridge to work, you'll need to the Cloudera JDBC connector for Hive and the JDBC driver of your Hive metastore (in this case, Postgres).
Ensure both of the drivers are accessible by TDC server or an Agent.
Create a new Physical Data Model to harvest the Hive metastore.
On the Properties tab, select the Cloudera Enterprise Hadoop Hive Database.
On the Import Setup tab, in the User and Password fields, enter your Hive metastore credentials (typically set up during the cluster installation). If you are not able to retrieve it, use the StackExchange, connect to PostgreSQL server: FATAL: no pg_hba.conf entry for host tutorial for a Hive metastore with Postgres.
Review your settings, when you're finished, the Import setup tab should look like this:

Click Import to start the metadata import.
After a successful import, navigate to Data Catalog > Metadata Explorer > Metadata Browser, and locate the harvested metadata, in this case, employee_male (Table).

Cloudera Navigator bridge

Create a new model, then in Model Type, select Cloudera Navigator - New Beta Bridge.
On the Import Setup tab, fill in the Navigator URL*, Login, Password, then filter the Operations you want to harvest, in this case, hive.
After a successful import, you'll find metadata harvest for Navigator, the Hive (connection), and the dynamic Data Integration Job (Di Model).

Stitching

Make sure that your harvested models belong to your configuration, by dragging them to the configuration you set up earlier.
On the Manage menu, select Manage Contents, click the Nav model, then select the Connection tab on the right.
Select the Connection Name, in this case, Hive.
Click Edit, then, from the Database pull-down list, select Hive (click Edit Schemas if it wasn't done before), then click OK.
Click Build, then click the Diagram to see the connection between models.
Trace the data lineage of the Talend Job containing dynamic queries executed against Hive.

Conclusion

This article showed you how to handle the Talend Data Catalog Harvesting process of DI Jobs using context variables and dynamic queries on a Cloudera Cluster leveraging Navigator and Hive bridges to trace data lineage.

Show Less

0

492

Official Support Articles

'Problem with connection to Integration Cloud' error when testing the connectio...

Problem Description Testing the Integration Cloud connection in Studio (Preferences > Talend > Integration Cloud) fails with the following error:... Show More

Problem Description

Testing the Integration Cloud connection in Studio (Preferences > Talend > Integration Cloud) fails with the following error:

Root Cause

This is a JDK bug: https://bugs.openjdk.java.net/browse/JDK-8144566

Solution

Use a JDK 1.8 update > 151 in Talend Studio. For more information, see the Where to define JAVA_HOME for Studio / TAC / JobServer / CommandLine article, in the Talend Community Knowledge Base (KB).

Warnings

The JRE installed by default with Talend Studio installer may contain the JDK-8144566 bug. Check the JRE/JDK version and update it if necessary.

Show Less

0

126

Official Support Articles

Talend Studio: Repository does not allow updating assets: releases (400) error w...

Talend is configured with a Nexus installed on a different machine. A Job is published for the first time to Nexus using Studio. When publishing, or ... Show More

Talend is configured with a Nexus installed on a different machine. A Job is published for the first time to Nexus using Studio. When publishing, or redeploying, the same Job to Nexus again, it fails with the following error:

java.io http://java.io) (http://java.io (http://java.io/) /) .IOException: org.eclipse.aether.deployment.DeploymentException: Failed to deploy artifacts: Could not transfer artifact org.example.bt14.job:wj_test:zip:0.2.0 from/to releases (http://absitstalend:8081/repository/releases/😞 Repository does not allow updating assets

Because it is against Maven best practices, Nexus, by default, disables redeployment of release artifacts. A release Group, Artifact, Version coordinate (GAV) should correspond to a unique artifact, (otherwise your builds will be inconsistent).

Resolution

To enable redeployment of a Job, change the Deployment policy field in the Repositories configuration to Allow Redeploy in Nexus.

For step-by-step instructions on how to enable redeployment, see '400 Bad Request' error when publishing a Job from Studio to Nexus.

Show Less

0

2077

Official Support Articles

Causes of the "UnsupportedClassVersionError" exception

Symptoms This error can occur when you execute a Job script outside of Talend Studio, and the JVM used to execute the Job is different from the J... Show More

Symptoms

This error can occur when you execute a Job script outside of Talend Studio, and the JVM used to execute the Job is different from the JVM used to compile the Job.

java.lang.UnsupportedClassVersionError

If you don't know which JVM the machine has, you can execute the following command at the command prompt:

java -version

In the following Windows example, the java version is "1.6.0_11".

C:\Documents and Settings\Administrator>java -version
java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)

Procedure

This problem can be fixed by rebuilding the Job with the same JVM version as the one on the machine where the Job is executed. Follow these steps:

Execute the command to check the JDK version in the execution environment:
```
java -version
```
Refer to this page to rebuild the Job with the correct JVM version.
Export the Job script again and move it to the target system or re-generate the Job and deploy it to the Job server in Talend Administrator Center.

Show Less

0

167

Official Support Articles

Which encoding does tMomOutput use?

Question If not specified, which encoding/code-page does tMomOutput use? Answer tMomOutput will use the default encoding of the machin... Show More

Question

If not specified, which encoding/code-page does tMomOutput use?

Answer

tMomOutput will use the default encoding of the machine/locale where the Job is running.

If you don't want to use the default encoding, then you can overwrite it as follows:

Select the Advanced settings tab of tMomOutput.
Select the Set MQMD Fields checkbox.
Click the green + sign to add a field.
Click the Field Name column to select characterSet.
Set the Field value of the encoding value you want.

Show Less

0

113

Official Support Articles

What is the difference between Built-In and Repository?

Answer Built-in: all information is stored locally in the Job. You can enter and edit all information manually. Repository: all information is ... Show More

Answer

Built-in: all information is stored locally in the Job. You can enter and edit all information manually.

Repository: all information is stored in the repository.

You can import read-only information into the Job from the repository. If you want to modify the information, you must take one of the following actions:

Convert the information from Repository to Built-in and then edit the built-in information.
Modify the information in the Repository. Once you have made the changes, you are prompted to update the changes into the Job.

Which is better?

It depends on the way you use the information.

Use Built-In for information that you use only once or very rarely.
Use Repository for information that you want to use repeatedly in multiple components or Jobs, such as a database connection.

Show Less

0

381

Official Support Articles

Tracing records with breakpoints

Overview Talend Studio is an IDE based on Eclipse RCP. It provides a proprietary record trace debugger and allows you to run Talend Jobs in Trace... Show More

Overview

Talend Studio is an IDE based on Eclipse RCP. It provides a proprietary record trace debugger and allows you to run Talend Jobs in Trace mode and in Debug mode, to set a breakpoint on a data flow, and to trace records.

Environment

This procedure is compatible with all versions of Talend Data Integration (subscription only).

Procedure

Create an example Job

Create an example Job called TraceRecordsWithBreakpoint. Use a tFixedFlowInput to generate some source data such as:

1;Shong
2;Elise
3;Dave
4;Mike
5;Pedro

The detailed Job settings are shown in the following figure:

Set a breakpoint

To set a breakpoint on the data flow, proceed as follows:

Right-click the connector between two components and select Show Breakpoint Setup.

Note: This feature is available only in Talend Data Integration (on subscription only).
In the Breakpoint tab, check Active conditional breakpoint and/or Use advanced mode to set a breakpoint. In this example, if you check Active conditional breakpoint and set a breakpoint in the Condition table, the Job will pause when the value of the Name column equals Mike.

Trace records with breakpoint

Now run the Job in Traces Debug mode and trace records. Follow these steps:

In the Run view, click Debug Run tab, select Traces Debug from the Debug list, and click Traces Debug to run the Job.
As the figure shows, the Job pauses when the value of the Name column equals Mike, when it matches the condition of the breakpoint.

Result

Now you can trace the records by clicking Previous, Next, and Breakpoint.

Previous: return back to the previous record.
Next: go to the next record.
Breakpoint: continue to run and pause until next breakpoint.
Basic Run: continue to run until ends.
Kill: kill the Job.

Show Less

0

212

Official Support Articles

"HTTP ERROR: 500" in Studio when selecting "Help Contents" from the Help menu

Symptom When Help Contents is selected from the Studio Help menu, the following error appears: HTTP ERROR: 500 Problem accessing /help/index.jsp. Rea... Show More

Symptom

When Help Contents is selected from the Studio Help menu, the following error appears:

HTTP ERROR: 500
Problem accessing /help/index.jsp. Reason:
      Server Error
_______________________________
Powered by Jetty://

Solution

This issue has been resolved in Talend Studio 6.4.1.

The work-around for 6.3.1 is to use the F1 key to open the help panel instead.

Show Less

0

82

Official Support Articles

Cannot use tHDFSconnection and tHBASEconnection in the same Job

Symptom Cannot use tHDFSconnection and tHBASEconnection in the same Job. Diagnosis java.io.IOException: java.lang.reflect.InvocationT... Show More

Symptom

Cannot use tHDFSconnection and tHBASEconnection in the same Job.

Diagnosis

java.io.IOException: java.lang.reflect.InvocationTargetException 
 at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:383)
 at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:360)
 at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:244)
 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:187)
 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:149)
 at talend_project.hbase_connection1_0_1.HBASE_connection1.tHBaseInput_2Process(HBASE_connection1.java:4134)
 at talend_project.hbase_connection1_0_1.HBASE_connection1.tHBaseConnection_1Process(HBASE_connection1.java:3378)
 at talend_project.hbase_connection1_0_1.HBASE_connection1.tHDFSConnection_1Process(HBASE_connection1.java:3022)
 at talend_project.hbase_connection1_0_1.HBASE_connection1.tMysqlConnection_1Process(HBASE_connection1.java:2693)
 at talend_project.hbase_connection1_0_1.HBASE_connection1.tFileInputDelimited_1Process(HBASE_connection1.java:2470)
 at talend_project.hbase_connection1_0_1.HBASE_connection1.runJobInTOS(HBASE_connection1.java:5422)
 at talend_project.hbase_connection1_0_1.HBASE_connection1.main(HBASE_connection1.java:4713)

Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:381)
 ... 11 more

Caused by: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.DistributedFileSystem could not be instantiated
 at java.util.ServiceLoader.fail(ServiceLoader.java:232)
 at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
 at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
 at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
 at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
 at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2400)
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2411)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
 at org.apache.hadoop.hbase.util.DynamicClassLoader.<init>(DynamicClassLoader.java:104)
 at org.apache.hadoop.hbase.protobuf.ProtobufUtil.<clinit>(ProtobufUtil.java:163)
 at org.apache.hadoop.hbase.ClusterId.parseFrom(ClusterId.java:64)
 at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:69)
 at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:794)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:627)
 ... 16 more

Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecations([Lorg/apache/hadoop/conf/Configuration$DeprecationDelta;)V
 at org.apache.hadoop.hdfs.HdfsConfiguration.addDeprecatedKeys(HdfsConfiguration.java:66)
 at org.apache.hadoop.hdfs.HdfsConfiguration.<clinit>(HdfsConfiguration.java:31)
 at org.apache.hadoop.hdfs.DistributedFileSystem.<clinit>(DistributedFileSystem.java:116)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at java.lang.Class.newInstance(Class.java:442)
 at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
 ... 35 more
Job HBASE_connection1 ended at 13:11 27/09/2017. [exit code=1]

Both components were causing a jar conflict.

Solution

Lower the distribution version of the tHDFSconnection, after which the Job will run fine without any conflicts.

Show Less

0

138

Official Support Articles

Executing a Job using a tHiveLoad component, results in a 'SemanticException [Er...

Problem Description When executing a Job with a tHiveLoad component, the following error is observed in the Job log: Error while compiling stat... Show More

Problem Description

When executing a Job with a tHiveLoad component, the following error is observed in the Job log:

Error while compiling statement: FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''/user/talend/loader/X_CUSTOMER_FRMK_POC_1000_INSERT'': Move from: hdfs://HOSTNAME: 8020/user/talend/loader/X_CUSTOMER_FRMK_POC_1000_INSERT to: hdfs://ttspoc/user/hive/warehouse/giw_framework.db/talend_hive_test1 is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.

Root Cause

The values for the Hadoop properties: default.fs.name and hive.metastore.warehouse.dir in the tHiveConnection component are incorrect.

Solution

In the tHiveConnection component, set the Hadoop properties default.fs.name and hive.metastore.warehouse.dir with the correct values. You can find the correct values in the *-site.xml files of the Hadoop cluster.

Show Less

0

181

Official Support Articles

tPatternCheck component loses custom pattern after migrating from 6.2.1 to 6.4.1

Talend Version v6.4.1 Summary The tPatternCheck component loses a custom pattern after migrating from 6.2.1 to 6.4.1.Additional Versions Produ... Show More

Talend Version	v6.4.1
Summary	The tPatternCheck component loses a custom pattern after migrating from 6.2.1 to 6.4.1.
Additional Versions
Product	Talend Data Integration
Component	tPatterncheck
Problem Description	Current Behavior: When a Job developed in 6.2.1 Studio has a custom pattern, and then is migrated to 6.4.1, the tPatternCheck component loses the custom pattern values. Expected Behavior: The job in 6.4.1 should keep the custom pattern value that was chosen/configured.
Problem root cause
Solution or Workaround	The behavior seems to be a bug in the product, and JIRA #TDQ-14375 has been submitted to address it in a future release. The issue has been fixed in version 6.4.2, 6.5.1, and 7.0.1. Workaround Copy and re-enter the regular expression manually.
JIRA ticket number	TDQ-14375

Show Less

0

122

Official Support Articles

Talend 6.4.1 Platform Installer - Error with 8 or more CPU Windows system

Talend Version (Required) v6.4.1 Summary Issue with running platform installer with 8 CPU or more in Windows systemAdditional Versionsv6.3.1P... Show More

Talend Version (Required)	v6.4.1
Summary	Issue with running platform installer with 8 CPU or more in Windows system
Additional Versions	v6.3.1
Product (Required)	Platform Installer
Component (Required)	Installer
Problem Description	Running platform installer 6.4.1 with 8 or more CPU in Windows system throws an error message: This computer contains 8 logical CPUs. On such configurations, it is necessary to limit the decompression with the command-line option: --parallel-decompression-cores 4 Please re-run the installer with given option. The 6.3.1 installer runs for a very long time and terminates in the error message "Unable to realloc number bytes".
Problem root cause	The problem is related to too many decompression threads. Each thread takes about 350 MB of memory, and with 8 threads it requires about 2.8 GB of memory, which is too big for Windows 32-bit (Talend's Windows installer is 32-bit, even for 64-bit Windows).
Solution or Workaround	The installer should be started with the command-line option, limiting the number of decompression threads: Installer file name --parallel-decompression-cores 4 For example: Talend-Installer-20161216_1026-V6.3.1-windows-installer.exe --parallel-decompression-cores 4
JIRA ticket number	DOCT-8289

Show Less

0

101

Official Support Articles

tESBProviderRequest: 'Port 80 already in use'

Problem Description When running a Job designed with a tEsbProviderRequest component, you may encounter the following exception: Exception in t... Show More

Problem Description

When running a Job designed with a tEsbProviderRequest component, you may encounter the following exception:

Exception in thread "Thread-1" java.lang.IllegalArgumentException: Cannot start provider with uri: /gefService/. Port 80 already in use.
at dsi_test.gef_0_1.Gef$HandlerThread_tESBProviderRequest_1.run(Gef.java:814)

Root Cause

In this case, the Job is corrupted.

Solution

To fix the issue, follow these steps:

Ensure that the WSDL file being used is using port 80, and that the port is available for bidirectional communication.
Then, if you are still encountering the issue; check if the Job is migrated from lower environments to the latest version, if so, perform the following steps:
1. Delete the existing WSDL file under Services.
2. Delete the entire Job.
3. Restart Talend Studio.
4. Create a new WSDL file under Services.
5. Create a new Job.

For more information on how to create a WSDL file, see How to create a data service in the Talend Help Center documentation.

Show Less

0

78

Official Support Articles

Talend Studio: Custom libraries process

This document describes the process for working with the third-party libraries (custom libraries) used within Talend. Custom libraries are needed for... Show More

This document describes the process for working with the third-party libraries (custom libraries) used within Talend.

Custom libraries are needed for Talend components to work properly. These libraries are not delivered by Talend for contractual and licensing reasons, so they must be downloaded by the Studio.

This content is applicable for old 6.4.1 versions of Talend.

Content and process:

This document also describes what you can do if Talend is installed in an environment without an internet connection.

Configure TAC

Among other things, TAC needs the following configuration:

Note: Software Update is not part of the custom library process. It is the mechanism used to download patches and updates.

So you have a configuration for Nexus, the artifact repository, but this configuration is only the place for TAC to look for artifacts for deployment in the Job and ESB conductors. Studio has a separate configuration for the artifact repository to which it publishes, as detailed later.

Set up a user

The user needs to be set up with the appropriate credentials for the source code system.

Set up a project in TAC

Set up a project as shown below:

Assign rights to the appropriate user(s)

Assign rights to users as follows:

Start Studio

Before Studio starts for the first time, the Maven .m2 repository has not been set up:

When Studio starts for the first time, it sets up its local Maven repository. This is where it stores libraries delivered by Eclipse plugins that are not transferred to Nexus with the custom libraries.

At this point, you can see that the custom libraries directory – stored in org.talend.libraries – is still (almost) empty:

and the custom libraries in Nexus are also still empty:

Log in to Studio

This is what happens behind the scenes:

Here you can see why it is a bad idea to leave hostnames as localhost. When the Nexus location is passed to Studio, it would get localhost as the host name, and because Nexus is not local to Studio, Studio is unable to find Nexus to interact with it. As soon as the user has logged on to Studio, you see some libraries appearing in Nexus, but not the third-party libraries:

Now Studio will request that the custom libraries are downloaded, which need to have the licenses accepted. These are the custom libraries:

Clicking Finish starts the download and you must now accept the licenses, which is the crucial part:

Note: the licenses will need to be explicitly accepted whether or not Studio downloads the libraries from the internet or internally from Nexus. Then you see the libraries downloading into the Studio’s Maven repository:

And you see them appearing in the Nexus custom libraries repository:

A note about the command line

Even though the command line is running, the custom libraries do not appear automatically in its Maven repository.

It will only bring these across from Nexus when it needs to build them into an artifact, such as when a Publisher or Job Conductor task is run. As mentioned earlier, try to avoid using these methods to publish or build artifacts and as much as possible, publish artifacts to Nexus and deploy those.

Publish from Studio

Before publishing from Studio, you must configure the Nexus artifact repository in the Preferences.

Installing in an environment without an internet connection

Many companies do not allow software such as Talend to have access to the internet. These are most typically financial institutions, but any company or organization handling sensitive data will probably impose internet access restrictions. This obviously poses a problem for a mechanism that relies on access to the internet to access third-party files. How can you solve this problem?

There are two main ways:

Install at least one Studio into an internet-connected environment.
Upload a backup of the Nexus custom libraries repository.

Install Studio into an internet-connected environment

As you have already seen, once the third-party libraries are uploaded into Nexus, there is no further requirement for access to the internet, even for Studio. So the easiest solution is to install one Studio on a machine and configure it with access to TAC and the Nexus installation. This will enable third-party libraries to be downloaded into Nexus and from there, the libraries are accessible from any Studio that connects to TAC, regardless of whether or not that Studio has internet access.

Upload a backup of the custom libraries

This involves having an “external” installation with access to the internet – often your own laptop - where the third-party libraries have already been downloaded into Nexus. Once the libraries are loaded into Nexus in the “external” installation, a backup of the libraries is made and then restored to the unconnected installation.

The process to make a backup

By default, Nexus is installed with the TAC, in the TAC directory:
Within that, navigate to the storage for the Talend libraries in the custom libs repository:
Make a Zip backup of the entire directory.

The process to “restore” the backup to the target installation

Log in to the Nexus in the target installation as an admin user: the default username is admin, the default password is Talend123.
Click the Repositories menu item on the left:
Right-click the talend-custom-libs-release repository and select Put out of service:
Navigate to the storage for the Talend libraries in the custom libs repository, and unzip the backed up libraries archive:
Right-click the talend-custom-libs-release repository and select Put in service:
Right-click the talend-custom-libs-release repository, select Rebuild Metadata, then right-click and select Repair Index.

The libraries are now available for any Studio that logs in to the TAC.

Show Less

0

793

Official Support Articles

Cannot add a Remote JobServer from Studio

Problem Description The + button is disabled in the Preferences > Talend > Run/Debug > Remote window, and you cannot add a Remote JobServer to ru... Show More

Problem Description

The + button is disabled in the Preferences > Talend > Run/Debug > Remote window, and you cannot add a Remote JobServer to run Jobs from Studio to the Target Exec JobServer.

Root Cause

When Studio is logged into a remote project (using a remote TAC connection), you cannot add a Remote JobServer from Studio because it's controlled by TAC.

Studio won't display any Remote JobServers because there aren't any JobServers listed on the Servers page in TAC.

Solution

In TAC, add the JobServer to the Servers page.

You are now able to see the same JobServer listed in Studio, and you can use it to run the Job using Target Exec.

Show Less

0

157

Official Support Articles

Jobs migrated to Talend 6.4 from Talend 6.3 or earlier throw a compilation error

Symptoms While running a Job in Talend 6.4, a compilation error occurs. The error pop up contains the error message Routines routine_name has com... Show More

Symptoms

While running a Job in Talend 6.4, a compilation error occurs. The error pop up contains the error message Routines routine_name has compile errors.

Diagnosis

Talend Studio versions 6.3 and earlier have Add all user routines to Job dependencies, when creating a new Job selected by default in Window > Talend > Preferences. Thus, when you migrate the Job from Talend 6.3 or earlier to 6.4, the routines are migrated as well and cause compilation errors in Studio.

Solution

Take a backup of the problematic routines by exporting the routines and deleting them from 6.4 Studio. You can find these routines under Code > Routines in the Repository browser. After deleting these routines, the Job should compile with no errors and run successfully.

Show Less

0

63

Official Support Articles

How to read attachments of an Account module in Salesforce

In Salesforce, Attachment and Account are different modules. An attachment of account means the attachment is a child record of the account. Every ... Show More

In Salesforce, Attachment and Account are different modules. An attachment of account means the attachment is a child record of the account. Every attachment object has a ParentID that refers to the Account.

There are two ways of reading attachments of an account:

Use join SOQL in the tSalesforceInput component, as shown in the example below:

SELECT <column list> FROM ATTACHMENT WHERE PARENTID IN (<list of ACCOUNT internal ID)

Read from Attachment and Account with different tSalesforceInput components, and use a tMap component to lookup the ParentID and find the Left Outer Join.

Show Less

0

105

Official Support Articles

Official Support Articles

Recent Documents

Overview

Environment

Procedure

Create an example Job

Export the Job script

Execute the Job

Question

Answer

Introduction

Prerequisites

Setting up Talend Studio

Building the dynamic integration Job

Building the data integration Job

Configuring Talend Data Catalog

Cloudera Hive bridge

Cloudera Navigator bridge

Stitching

Conclusion

Problem Description

Root Cause

Solution

Warnings

Resolution

Symptoms

Procedure

Question

Answer

Answer

Overview

Environment

Procedure

Create an example Job

Set a breakpoint

Trace records with breakpoint

Result

Symptom

Solution

Symptom

Diagnosis

Solution

Problem Description

Root Cause

Solution

Current Behavior:

Expected Behavior:

Workaround

Problem Description

Root Cause

Solution

Configure TAC

Set up a user

Set up a project in TAC

Assign rights to the appropriate user(s)

Start Studio

Log in to Studio

A note about the command line

Publish from Studio

Installing in an environment without an internet connection

Install Studio into an internet-connected environment

Upload a backup of the custom libraries

The process to make a backup

The process to “restore” the backup to the target installation

Problem Description

Root Cause

Solution

Symptoms

Diagnosis

Solution