Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Dobby1
Contributor III
Contributor III

Azure SQL Insert very Slow with Date Columns

I have a simple Talend job (only 3 components tDBInput, tMap and tDBOutput) that reads from  MS SQL Server database and inserts into an Azure SQL Server database that is performing slowly. I takes about 5 minutes to insert each batch of 10,000. The batch size and commit size are both set to 10,000.

 

I have narrowed down the performance issue to two (2) date fields (High_School_To and High_School_From ) in the insert statement. When these fields are included in the output, performance is evidently degraded.

  1. When both fields are included in the output (with the other fields), it takes about 5 minutes to commit the batch of 10,000
  2. When only High_School_From is included in the insert (with the other fields), it takes about 1 minute to commit the batch of 10,000
  3. When only High_School_To is included in the insert (with the other fields), it takes about 2 minutes 51 seconds to commit the batch of 10,000
  4. When both fields are the only fields included in the insert, it takes about 3 minutes to commit the batch of 10,000
  5. When only High_School_From is included in the insert, it takes less than 1 minute to commit the batch of 10,000
  6. When only High_School_To is included in the insert, it takes about 2 minutes 41 seconds to commit the batch of 10,000
  7. Of the 10,000 records, 9,130 (91%) of the High_School_From had null values and 5,893 (58%) had null values for the High_School_To column

 

There are other date columns that didn't impact the performance negatively. Below are the distribution of null values in those other date columns:

 

  1. Birthdate - 97 nulls
  2. [Active_From_Date] - 9,988 nulls
  3. [Active_to_Date] - 9,987 nulls
  4. [Created] - no nulls [Updated_Date] - no nulls

I ran the same process against an on-prem MS SQL server database with both of the date columns and did not experience that performance issue. I do expect that the on-prem server would respond faster but the performance was significantly better. It took 5 seconds to commit the batch of 10,000 vs the 5 minutes.

 

The @@VERSION from the on-prem database returns Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64) Oct 19 2012 13:38:57

I ran the same process against an Oracle database in the Oracle Cloud Infrastructure and it completed the commit in less that 5 seconds.

 

If I convert the date columns to string and insert map them to string fields in the output of if I update the table and set values for High_School_To and High_School_From where they are null, the job performs ok and so I am of the opinion that there is some issue with the Talend/Azure SQL database and how they handle the distribution of null valued is date columns.

 

Azure Support looked at the telemetry for the Azure database and stated that the actual insert and commit of the 10,000 records are happening in about 0.2 ms and they are suggesting that the delay is occurring at Talend's end.

 

I have tried the MS JDBC driver 7.4 and the latest 8.2 but no improvement is observed. I'm running Talend 7.2.

Labels (2)
0 Replies