Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
ALERT: The support homepage carousel is not displaying. We are working toward a resolution.

Explaining Talend data types

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
TalendSolutionExpert
Contributor II
Contributor II

Explaining Talend data types

Last Update:

Jan 22, 2024 9:35:30 PM

Updated By:

Jamie_Gregory

Created date:

Jun 29, 2023 7:55:03 AM

This article explains data types and common issues around them. It also provides ideas to overcome complex issues when working with non-trivial data types.

Content:

 

Regular data types

The following data types can be configured or modified.

Numbers

Both Java and database vendors offer different numeric implementations. The JDBC driver should make the transition without any data loss. However, there are types where this is not necessarily possible. Float or double floating point numbers cannot hold precise values. Mathematical operations on these numbers will yield different results based on the operation order. A good practice is to summarize the small numbers then add the bigger ones.

Another common mistake is when using globalMap. You need to use the Class instead of the primitive.

  • int > Integer
  • double > Double

When converting a String to a number, you could use the following:

  • Integer.valueOf / Integer.parseInt
  • Double.valueOf / Double.parseDouble

Date

java.util.Date and java.sql.Date hold timestamps. The tLogRow component displays these timestamps based on the pattern you configure. If the pattern only contains YYYY, the underlying data object will still contain more information. One of the often overlooked pieces of information is the timezone.

A hard-to-detect issue causing dates to move:

For example, one database returns a date as: 2021-03-10 Midnight, but the other database treats it as 2021-03-10 Midnight in Central Europe GMT+2, then converts it to UTC to store it and puts it to a date type which results in 2021-03-09 after ingestion. This is often seen as a bug, but there is a logical explanation:

The database stores the object as:

2021-03-10

This goes to Java, where it becomes:

2021-03-10 00:00:00.000 UTC

2021-03-10 02:00:00.000 GMT+2

(Depending on the conversion used.)

Then it goes to the database, which treats it as:

2021-03-10 00:00:00.000 GMT+2

2021-03-09 22:00:00.000 UTC

The conversion to UTC will cause the date to move up a day. Because of this, when you have date-related issues, you should always analyze the entire date, including the timezone, hours, and seconds.

Precision

0EM3p000002Gxdv.png

Since Talend uses java.util.Date, the micros and nanoseconds precision is not available. However, some components might check if the incoming Object type is java.sql.Timestamp, in that case, the nanoseconds are available as well. To use this, you need to select the Object type in the schema definition, and the source and target components need to support it.

Experiments worth execution

  • Use a tFileOutput component to dump and read these values. This way, you can control the timezone better.
  • Use the -Duser.timezone parameter to see if this changes the behavior. For more information, see the Oracle Time Zone Setting in the JRE documentation.
  • Consult the JDBC driver Talend Help Center documentation to learn how to handle these timezones.
  • Use a tLogRow component with the following pattern and result:
    "yyyy-MM-dd'T'HH:mm:ss.SSSXXX" 2001-07-04T12:08:56.235-07:00

Then the target database may or may not be able to store the timestamp part of the previous examples, which, combined with the timezone, could lead to days shifting.

Possible workarounds

Use Strings and see if the databases or their driver can handle the transformation. With String, what you see is what you get; there isn't extra hidden information.

Convert between timezones using Java code in a tJavaRow component:

String pattern = "yyyy-MM-dd";
log.info(TalendDate.formatDate(pattern, input_row.startDate));
output_row.startDateUTC =  TalendDate.parseDateInUTC("yyyy-MM-dd zzz", TalendDate.formatDate("yyyy-MM-dd", input_row.startDate)+" UTC");

For more information on data behavior, see the Your Calendrical Fallacy Is... web page.

byte[] / binary

This type is most commonly used with images, BLOB, and other binary data. However, some components might handle the binary format in Base64 encoded format represented as Strings. The most common mistake related to this type is when Object is used instead, which displays something like: @[aaa123 in the logs.

Object

Another name for this could be "Other". It can represent any type of Java that was explained above. This can be useful to represent some special types, for example, timestamp.

However, to access all the available information, you must cast it to its original type.

Comparison

Integer type supports comparison using the == format. However, other data types, such as String, should be compared using the .equals() or .compareTo() functions.

Common mistakes:

context.value.equals("foo") could lead to NullPointerException; however, "foo".equals(context.value) will result in false in case value is null. context.value == "foo" will not work.

Special data types

When working with special data types, always consult the database documentation.

Snowflake - Geography type

Snowflake supports geography type. However, Talend does not support it out of the box. For more information on converting Strings to geography type, see the Geospatial Data Types Snowflake web page.

By automatically applying the TO_GEOGRAPHY function:

You can test the following easily by creating a schema and testing the data using a tFixedFlowInput component.


0EM3p000002Gxpc.png

By defining your input schema as shown above and having the geography data represented by the following String:

POINT(-122.35 37.55)

The Snowflake table is defined like this:

create table "SAMPLEGEO" (str1 text, str2 geography, int1 number);


Using the tSnowflakeOutputBulkExec component results in the following:

0EM3p000002Gxpr.png
 

Notice that the String Talend type was converted to geography in Snowflake.

It is always worth checking if you are using String or Object data types the special or exotic data types can be loaded into the database.
 

Labels (2)
Version history
Last update:
‎2024-01-22 09:35 PM
Updated by: