Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a simple Talend job (only 3 components tDBInput, tMap and tDBOutput) that reads from MS SQL Server database and inserts into an Azure SQL Server database that is performing slowly. I takes about 5 minutes to insert each batch of 10,000. The batch size and commit size are both set to 10,000.
I have narrowed down the performance issue to two (2) date fields (High_School_To and High_School_From ) in the insert statement. When these fields are included in the output, performance is evidently degraded.
There are other date columns that didn't impact the performance negatively. Below are the distribution of null values in those other date columns:
I ran the same process against an on-prem MS SQL server database with both of the date columns and did not experience that performance issue. I do expect that the on-prem server would respond faster but the performance was significantly better. It took 5 seconds to commit the batch of 10,000 vs the 5 minutes.
The @@VERSION from the on-prem database returns Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64) Oct 19 2012 13:38:57
I ran the same process against an Oracle database in the Oracle Cloud Infrastructure and it completed the commit in less that 5 seconds.
If I convert the date columns to string and insert map them to string fields in the output of if I update the table and set values for High_School_To and High_School_From where they are null, the job performs ok and so I am of the opinion that there is some issue with the Talend/Azure SQL database and how they handle the distribution of null valued is date columns.
Azure Support looked at the telemetry for the Azure database and stated that the actual insert and commit of the 10,000 records are happening in about 0.2 ms and they are suggesting that the delay is occurring at Talend's end.
I have tried the MS JDBC driver 7.4 and the latest 8.2 but no improvement is observed. I'm running Talend 7.2.