Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
KennyLim-SF
Contributor
Contributor

Timestamp precision problem when ingesting CSV

Hi there,

I am a support engineer from Snowflake. I would like to get in touch with Talend support team with regards to a problem that a customer is having when ingesting CSV file that contains date (in the format of "YYYY-MM-DD") earlier than 1970-01-01 (Unix epoch) into Snowflake via Talend Studio (7.1.1.20201230_0510-patch).

Example of a few rows of the CSV file:

   1234567890|1980-02-19

   1234567891|1969-11-23

While the date in the CSV file is in string format, it looks like Talend Studio converts the date column data into Unix epoch (in miliseconds) before staging and inserting the data into Snowflake. We were able to confirm this by intercepting the temp file generated by Studio (seen in the log output when running a job in Talend Studio). The Unix epoch conversion is the problem as Snowflake treats date/time values (in Unix epoch integer) that are less than 31536000000 (the number of milliseconds in a year) as a number of seconds, not miliseconds.

Using the 2nd row of the example above, Talend Studio would convert the date to -3369600000 (in miliseconds precision). The converted data is then staged into Snowflake and inserted into a target table. However, Snowflake regards any number <

31536000000 as seconds, so when the row is inserted, it is interpreted as 1863-03-23 instead.

See documentation here for reference: https://docs.snowflake.com/en/user-guide/date-time-input-output.html# auto-detection-of-integer-stored-date-time-and-timestamp-values

   DATE and TIMESTAMP:

  • After the string is converted to an integer, the integer is treated as a number of seconds, milliseconds, microseconds, or nanoseconds after the start of the Unix epoch (1970-01-01 00:00:00.000000000 UTC).
    • If the integer is less than 31536000000 (the number of milliseconds in a year), then the value is treated as a number of seconds.
      • Caution
      • Currently, negative values are always treated as seconds. For example, -31536000000000000000 is treated as a number of seconds before the year 1970, although its scale implies that it is intended to be used as nanoseconds.
      • This behavior might change in the future.
    • If the value is greater than or equal to 31536000000 and less than 31536000000000, then the value is treated as milliseconds.
    • If the value is greater than or equal to 31536000000000 and less than 31536000000000000, then the value is treated as microseconds.
    • If the value is greater than or equal to 31536000000000000, then the value is treated as nanoseconds.

Note that the problem does not happen consistently, but we have a recording of it happening.

When attempting to ingest the same CSV directly using Snowflake (either via a connector or Snowsql client), such issue does not arise (as there is no Unix epoch conversion happening). Can someone please get in touch with me, as I am unable to raise a support case without a product code?

Regards,

Kenny Lim

Labels (2)
0 Replies