Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] Errors when loading large data sets to/from Microsoft Azure

Hello,
When trying to use a Microsoft Azure connection in my Talend package (Data Integration v5.6) I get errors like the following:
"I/O Error: Connection reset
Exception in component tMSSqlOutput_1
java.sql.SQLException: Invalid state, the Connection object is closed.
at net.sourceforge.jtds.jdbc.ConnectionJDBC2.checkOpen(ConnectionJDBC2.java:1699)
at net.sourceforge.jtds.jdbc.ConnectionJDBC2.commit(ConnectionJDBC2.java:2085)
at dbservicestest.azure_query_test_0_1.Azure_Query_Test.tMSSqlInput_2Process(Azure_Query_Test.java:1397)
at dbservicestest.azure_query_test_0_1.Azure_Query_Test.tMSSqlConnection_1Process(Azure_Query_Test.java:529)
at dbservicestest.azure_query_test_0_1.Azure_Query_Test.runJobInTOS(Azure_Query_Test.java:1755)
at dbservicestest.azure_query_test_0_1.Azure_Query_Test.main(Azure_Query_Test.java:1540)"
For example, in a job where I send rows from a local SQL database (via tMSSqlInput) to a tMSSqlOutput component connected to Azure, it is successful at loading a small number of rows (1-97), but when loading more rows (more than 98 or so) it fails with an I/O Error. 
I will need to send tens of thousands of rows, so batching things in groups of 50 is not practical.
--Porter
Labels (5)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

I figured out how to get this work.
1. Add some keys to the registry, as described in this article: https://msdn.microsoft.com/en-us/library/hh290696(v=sql.110).aspx
2. In the Advanced settings of the component, change the batch size to 10
This runs very slow, and I'm not sure it's the cleanest solution, but it's the only way I was able to get it to work.

View solution in original post

12 Replies
Anonymous
Not applicable
Author

Hi,
Could you please show us the screenshots of job setting and sql query?
Best regards
Sabrina
Anonymous
Not applicable
Author

Here are some screenshots:

The error is in the "tMSSqlOutput_1" component.
Anonymous
Not applicable
Author

Looks like I'm having trouble uploading screenshots... let me try again.

EDIT: Still unable to post screenshots. I'm using "Drop files here to attach them" below, which seems to work until I hit submit, at which point it scrubs the images from the post before submitting it.
Anonymous
Not applicable
Author

I figured out how to get this work.
1. Add some keys to the registry, as described in this article: https://msdn.microsoft.com/en-us/library/hh290696(v=sql.110).aspx
2. In the Advanced settings of the component, change the batch size to 10
This runs very slow, and I'm not sure it's the cleanest solution, but it's the only way I was able to get it to work.
Anonymous
Not applicable
Author

Hi,
Thanks for your solution. Have you tried to change the batch size to 100 to see if it is better?
What's the current rate for your work flow?
Best regards
Sabrina
Anonymous
Not applicable
Author

I had initially tried lowering the batch size to 100 and even to 50, but it was still dropping the connection at times (I'm guessing this is because of a slow or unstable connection to Azure). Only by reducing it to 10 was I able to get it to work--though I may be able to experiment more to find the "sweet spot".
The data flow rate seems to average between 200-400 rows/second. I don't necessarily blame Talend for this--it could be that our connection to Azure is just slow--but it seems that sending things in bigger batches might help reduce overhead, if I could get it to not drop the connection. C'est la vie!
Anonymous
Not applicable
Author

Hi,
Did you ever get a better solution to this?  I am having the same or very similar issue trying to write to Azure.  I get a "I/O Error: Connection reset by peer: socket write error" whenever the batch size is about 10.  I started at 10,000 and kept trying different numbers until it would work.  The problem is this makes the job take 5 hours to complete for 1 of 12 tables.  That isn't going to work for me.
Maybe I should update my MS SQL java driver but I am not really sure how.
Anonymous
Not applicable
Author

Hi,
Did you ever get a better solution to this?  I am having the same or very similar issue trying to write to Azure.  I get a "I/O Error: Connection reset by peer: socket write error" whenever the batch size is about 10.  I started at 10,000 and kept trying different numbers until it would work.  The problem is this makes the job take 5 hours to complete for 1 of 12 tables.  That isn't going to work for me.
Maybe I should update my MS SQL java driver but I am not really sure how.

It looks like our current approach is to uncheck "Die on error", so that the process will silently continue on its way if there is an error writing to the database. We have our batch size set to 10.
Probably a better approach would be to make a custom component (or just ditch Talend and write a custom app) that handles the IO exception and resumes sending data using some sort of exponential backoff. But we didn't have the time to do that, and there were other priorities, so we left it like this for now.
Anonymous
Not applicable
Author

I actually found a solution today.  It seems that Talend's MS components are using an open source jdbc driver that might be out of date. I am not sure if the problem I had is with all MS sql but at least with Azure data sources. To make this work for me I downloaded a jdbc driver directly from Microsoft and set up a Generic JDBC source in Talend.  The JDBC connection string should be like this.
jdbc:sqlserver://<serveraddress>;databaseName=<dbname>;user=<username>;password=<password>
I was able to leave the batch size the same and complete my task in a few minutes instead of the 5 hours it took with built in components.