Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I was debugging a Talend Data Integration jobs for days searching why sometimes my job faces some FK violations in the database. Today I realized that some Date fields are being stored with the time part of the date as "01:00:00" instead of "00:00:00".
I produced a very small example to illustrate what I think it is a bug:
tMSSQLInput ---> tJavaRow
tMSSQLInput (connected to any database, it doesn't really matter):
select convert(DATETIME, '2015-10-18') mydate union all select convert(DATETIME, '2015-10-19')
union all
select convert(DATETIME, '2015-10-20')
tJavaRow
System.out.println(input_row.mydate);
When I run this small job, I got the following result:
[statistics] connecting to socket on port 4031 [statistics] connected Sun Oct 18 01:00:00 BRST 2015 Mon Oct 19 00:00:00 BRST 2015 Tue Oct 20 00:00:00 BRST 2015 [statistics] disconnected
Note that the 2015-10-18 is with the time part as "01:00:00". Particularly, I can only reproduce this behavior with this magic date. Any other date, except Oct 18, 2015, set the time part as "00:00:00".
Do someone know why this happens? Can someone confirm this weird behavior is really a bug, or am I missing something?
I'm afraid I have a bit of an issue with this sentence " Since my database is not using time zones at all, I got an inconsistent behavior". First of all, databases do use timezones. Most will internally store dates as UTC but they will also either have configurable timezones or will just use the system default.
Secondly, the date you are talking about is impossible in your locale. So if your database is storing it as "2015-10-18 00:00:00", it is configured incorrectly and you are using UTC and adjusting it to BRST. However, I suspect from looking at your example that you are simply expecting the date to be "2015-10-18 00:00:00" in your database because all of your other dates specified without explicitly setting a time, show midnight. Is this case. I notice you are using SQL Server. Try this code and you will see what timezone is default for your SQL Server.....
DECLARE @TimeZone VARCHAR(50) EXEC MASTER.dbo.xp_regread 'HKEY_LOCAL_MACHINE', 'SYSTEM\CurrentControlSet\Control\TimeZoneInformation', 'TimeZoneKeyName',@TimeZone OUT SELECT @TimeZone
If it is Brazil then it should never represent that date as "2015-10-18 00:00:00".
To show that this behaviour is not a Talend bug and is the same for all Java applications, I have quickly knocked up the following code. This can be run in a tJava within a Job or in any Java IDE. It will do exactly the same.
//Setting the default timezone java.util.TimeZone utcTimeZone = java.util.TimeZone.getTimeZone("America/Sao_Paulo"); java.util.TimeZone.setDefault(utcTimeZone); System.out.println(java.util.TimeZone.getDefault().getDisplayName()); //Date set using only yyyy-MM-dd. Normally defaults to 00:00:00 SimpleDateFormat noTimeFormat = new SimpleDateFormat("yyyy-MM-dd"); Date date1 = noTimeFormat.parse("2015-10-18"); //Date set using only yyyy-MM-dd'T'HH:mm:ss. Explicitly setting time to 00:00:00 SimpleDateFormat withTimeFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss"); Date date2 = withTimeFormat.parse("2015-10-18T00:00:00"); //Show Java time in milliseconds since Jan 1 1970 System.out.println("Time in milliseconds after date set using only yyyy-MM-dd = " + date1.getTime()); System.out.println("Time in milliseconds after date set using only yyyy-MM-dd'T'HH:mm:ss = " + date2.getTime()); //Show date and time formatted to yyyy-MM-dd'T'HH:mm:ss System.out.println("Date shown with yyyy-MM-dd'T'HH:mm:ss for date1 = "+withTimeFormat.format(date1)); System.out.println("Date shown with yyyy-MM-dd'T'HH:mm:ss for date2 = "+withTimeFormat.format(date2)); //Set date using milliseconds Date dateFromMilli1 = new Date(date1.getTime()); //Set date using milliseconds minus 1 millisecond Date dateFromMilli2 = new Date(date1.getTime()-1); System.out.println("Date when set by date1's time in milliseconds = "+withTimeFormat.format(dateFromMilli1)); System.out.println("Date when set by date1's time in milliseconds minus 1 millisecond = "+withTimeFormat.format(dateFromMilli2)); System.out.println("Not a Talend bug");
If either of your databases is representing the date with "00:00:00", then it is an issue with your configuration. If you are mixing your timezones, you will get issues like this.
As I said before, easy timezone functionality within Talend might be a nice feature request, but this is not a bug.
You are right, but you are also wrong.
Obviously my database has a timezone. Every system has one, if this is your point.
But I am using a field that do not care to them. We are dealing with data, not with systems. In SQL Server we have DATETIME field (date without timezone) and DATETIMEOFFSET (date with timezone).
So, since the field is not taking the timezone in consideration, "2015-10-18 00:00:00" is a valid value, no matter where you are, or what's your system time zone. How you will use it, it is up to you and your system. So sorry if I was not clear before, but I meant that I was using for dates only datatypes without timezone, not that my system is nowhere in the space.
You made a cool demonstration to show me how Java handles time zones. But databases sometimes don't use them, because of a very common design decision. Talend is enforcing a timezone when it uses Date Class to store the value without considering what Java will do with it. This is wrong.
OK, but Talend is not enforcing a timezone, that is Java. Talend is entirely relying on Java's implementation of Date.
I think it would be nice to be able to select a timezone for a job without having to either run the job on a machine with the correctly set timezone, or set it in code. It would be a nice feature request.
There is no such thing as a date without a timezone though. The closest to what you are describing is UTC. I would argue that your scenario is actually not a common scenario. While I agree that an option might be nice to allow users to tailor this with a setting, there are hundreds of similar quirks of Java that many would argue would be nice to be able to have an easy to set Talend feature. In fact, Talend do provide a mechanism to deal with this. If you go to the "Run" tab and select "Advanced settings" you will see that you can specify JVM arguments. I suspect that the JVM argument you will want will be .....
-Duser.timezone=UTC
Using UTC time is arguably the standard that all systems should use to avoid confusion. This is very simple to set, but maybe requires a little Googling to find.
My point is that you can handle this with what Talend provides. As an example, recursion in SQL Server can be handled, but not without a greater than normal understanding of how to use the product. That is the same here. It can be handled, but you need to know how to handle it. But it is not a bug.
There is no such thing as a date without a timezone though.
Again, you are right and wrong at the same time. I think your sentence may be rewritten as: "Java does not have a class to handle dates without time zones." If it is the case, I agree with you.
I expect you understand the distinction between an "instant in time" from a "position in the calendar" (date), a "position in the clock" (time), and a "position in space" (time zone). You need all these three elements to describe an "instant in time".
The Date Class represent an instant in time, it is not dependent of a timezone at all. But when we print a instant in time on the screen, we like it to be presented as a position in calendar, and a position in the clock, relative to our time zone. This is fine, and it is exactly what Java do. In Java, as in most systems, dates follow system locale when not explicit set.
But my data source is not giving to Talend an "instant in time". It is providing a position in calendar, and a position in the clock. This is my raw data, nothing else. But when Talend stores this information in the Date Class, it is 'converting' my raw data into an "instant in time", which is a representation that my data can't afford.
Particularly, what we found here is that, when the time zone is not explicitly set, Java does not have a compromise of using the input time zone as the output (in the case described here, my input is read as BRT, but the output is BRST, because Java noted that BRT doesn't exists at that specific date/time: who asked for this sanity check?). But Talend is dealing with a DATETIME field that doesn't have a time zone who needs to flow to another database that also doesn't. The information is not being presented to the user, it is not changed or anything. Don't you think it should be consistent?
You are trying to convince me that this is OK, because I am presenting a very specific scenario and that my environment is not set correctly, or anything similar. Bugs usually happens on specific scenarios, don't they? My point is that, in Java, a Date class is NOT suitable to represent a DATETIME field without time zone (at least, not without handling the lack of this information). It is a misconception of who mapped the fields of Talend that the Date Class fits this kind of field.
And again you are pointing hacks I can do. In your last message, you are telling me: "fake your time zone to UTC, because Talend does not care what Java is doing with the dates". If you search Google, you will find several sysadmins that prefer to set their servers to UTC to avoid these problems. I know. This is fine as a project decision, but you only have to do this because your software is bad designed. When this happens, you fix it or you give up?
I know there are thousands of workarounds, this is not what I am looking for and my specific problem is already solved. Now I am trying to find what we can do to fix this behavior, because it is indeed inconsistent.
I can imagine several situations where the results would be inconsistent. What should I do if I have a job that deals with data with and without timezone at the same time? How about if each database has a different time zone? Talend is relying on Java's implementation, great. But how about the adjacent systems? Are they fully compatible with Java's implementation? The solution is to always write custom code to assure the data is not messed, or is Talend able to anticipate this and handle everything nicely?
I can imagine several situations where the results would be inconsistent. What should I do if I have a job that deals with data with and without timezone at the same time? How about if each database has a different time zone? Talend is relying on Java's implementation, great. But how about the adjacent systems? Are they fully compatible with Java's implementation? The solution is to always write custom code to assure the data is not messed, or is Talend able to anticipate this and handle everything nicely?
This is exactly why UTC is used to store dates across systems which may be used in different time zones. This is how Facebook works, how Google works.....you name a major international organisation and this is how it would work. It is not a hack, it is the right way to do it. The data is stored in UTC and then translated to where it is needed. Dates are never independent of time zones in reality. You are using a flawed philosophy when you use an example where you expect to be able to represent a period of time which is impossible in some time zones, in those time zones. To alleviate this you have UTC which allows you to store those times irrespective of regional fluctuations and allows you to cater to them with regional code.
Your database (whether you know it or not) is using a time zone. I have no idea which, but it will be using one. You cannot escape a time zone when it comes to dates because of the issues that regional differences cause and the need to translate between them. We could have a long philosophical debate on whether a time in space requires a timezone when compared to GMT or whether time really exists, but we are getting away from the point.
Talend does not have a bug. It is a development tool. A tool that sits on top of Java (a Java code generator). The flaw you are trying to highlight is actually a massive advantage if you know how to make use of it. No programming language makes everything easy for every permutation of requirements. They tend to be engineered towards the most common requirements. Most people use dates with timezones. In the real world dates are dependent on a zone when used by people, when being stored by a computer they are generally stored independent of fluctuating time zone (daylight savings time, etc) and that is a key use of UTC. If you want to have a date that is not hampered by regional fluctuations (which would be advised in most systems storing data), then you use UTC.
https://en.wikipedia.org/wiki/Coordinated_Universal_Time
This is not a Talend issue, your issue here is with Java. But this isn't an "issue" with Java, it is a feature that is very useful. What you have basically criticised are fundamental standards for dealing with dates. I have shown you how to force your Talend job to use UTC in a couple of ways (which is what you want), but going back to our earlier discussion I agree that adding this in a more user friendly way might be a nice feature.
... The data is stored in UTC and then translated to where it is needed...
From the Java Docs:
How this can be? Aren't they stored in UTC? How it depends of the environment? Oh lord. My head is blowing!
... You cannot escape a time zone when it comes to dates because of the issues that regional differences cause and the need to translate between them...
How about this, this and this?
Never mind. Obviously they don't know what they are talking about. It seems to be a "long philosophical debate".
You won. Forget about it.
This wasn't about "winning" or "losing", it was about your incorrect assertion that the behaviour you experienced was a Talend bug. As I have said from the beginning, it is not a bug. We then moved over to your assertion that Java is seriously flawed in the way that it handles Dates, and that makes it a "Talend bug" because Talend doesn't automatically understand what you want to do and provide a drop down "fix all" to cover every permutation of Date usage. That is not a bug, it is not even a flaw in functionality. Talend is a development tool and while many Talend sales guys may claim their "grandmother can do it", you can only get the most out of a tool like this if you have more than a basic understanding of computer science....and it REALLY helps if you have a working understanding of Java.
The functionality that you are looking for can quite easily be built within Talend, but it is not standard and as such you will not find easy drag and drop solutions to it. I did suggest a relatively easy fix to this (using UTC and setting your JVM argument to ensure that this would be the case), but you have managed to find a flaw in this through Googling.....which will not actually affect you and I think you know that.
By the way, the examples you have given to counter my last post do not really work. You really need to read the whole piece about UTC in the Java documentation and understand it. Regarding the "Sometimes you really just want to represent a "local" (timezone unaware) date and time rather than an instant in time. To be honest it's more often useful to represent just a time - e.g. 'wake me up at 8am, regardless of timezone' - but date and time could be useful too".... comment, you do realise that to enable that functionality on a phone, for example, you would need to have a universal time recorded and maintained (UTC) and an awareness of wherever you are in the world in order to calculate when to "wake me up at 8am". What that is talking about is not fixing a timezone for your application, but being able to get notifications to wake you at 8am wherever in the world you are. Precisely why timezones are important
I have to echo Diego's sentiment with date handling. Searching for a solution to my own frustrating experience has brought me to this thread. In my case, I am downloading dates from Salesforce - which are always expressed in UTC with no time zone offset - and inserting them into a SQLite database via a tMap. Talend will defer to Java's default date handling, which is to assume the input date has a time zone the same as the user's system time zone. My time zone is EST, so the result is that a GMT time zone date retrieved from Salesforce is assumed to be EST. I've attached a job to demonstrate this.
The solution to add a compiler directive
-Duser.timezone=UTC
should work, but really if a date is expressed in UTC frmat then Talend should not defer to Java's default handling of dates - just don't convert it to any time zone... take it as is.
-Dave