Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Salesforce Bulk API java.net.SocketException: Connection reset

Hi all,
I am using Talend 5.6.0_20141024_1545
My organization is trailing Talend for ELT and we are testing with Salesforce.
We have a large LEADS table ~3 million rows with ~400+ columns.
This means that the bulk component is necessary to pull data.
I have tried many ways to get this data, including splitting the SOQL query by createddate as a filter.
I get this error

Exception in component tSalesforceInput_1
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.security.ssl.InputRecord.readFully(Unknown Source)
at sun.security.ssl.InputRecord.read(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readDataRecord(Unknown Source)
at sun.security.ssl.AppInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.ChunkedInputStream.fastRead(Unknown Source)
at sun.net.www.http.ChunkedInputStream.read(Unknown Source)
at java.io.FilterInputStream.read(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
at java.util.zip.InflaterInputStream.fill(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
disconnected
at java.util.zip.GZIPInputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.read1(Unknown Source)
at java.io.BufferedReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.read1(Unknown Source)
at java.io.BufferedReader.read(Unknown Source)
at com.talend.csv.CSVReader.fill(CSVReader.java:444)
at com.talend.csv.CSVReader.readNext(CSVReader.java:189)
at org.talend.salesforceBulk.SalesforceBulkAPI.getQueryResult(SalesforceBulkAPI.java:370)
at dw2.bulk_lead_pt1_0_1.bulk_lead_pt1.tSalesforceInput_1Process(bulk_lead_pt1.java:16439)
at dw2.bulk_lead_pt1_0_1.bulk_lead_pt1$6.run(bulk_lead_pt1.java:23414)

This is my batch info from the console
-------------- waiting ----------,firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2014,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=3,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK=5,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=5,HOUR_OF_DAY=5,MINUTE=48,SECOND=50,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0]'
systemModstamp='java.util.GregorianCalendar,firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2014,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=3,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK=5,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=6,HOUR_OF_DAY=6,MINUTE=44,SECOND=23,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0]'
numberRecordsProcessed='1345006'
numberRecordsFailed='0'
totalProcessingTime='0'
apiActiveProcessingTime='0'
apexProcessingTime='0'
]

As you can see, it only processed 390,949 rows out of 1,345,006 before failing.
0683p000009MBYz.jpg
Labels (4)
7 Replies
Anonymous
Not applicable
Author

I have tried on a much smaller table, 219 columns and 133932 rows.
I get the same error.

0683p000009MBKl.jpg
Anonymous
Not applicable
Author

I have also tried storing the tMap on disk and writing the CSV in row mode to see if it would help, but it does not.
Additionally, I tried increasing the TimeOut to 600000 ms on the Salesforce connection, but this does not seem to help either.
Can anyone give some tips on how else to troubleshoot? 
Has anyone else experienced this problem before?  I have tried to search the forums and the rest of the web but I can't find anything.
Anonymous
Not applicable
Author

Update:
Created a JIRA ticket
https://jira.talendforge.org/browse/TDI-31213
Anonymous
Not applicable
Author

Generally this error is caused by a network problem and so far no communication with SalesForce has taken place. Therefore changing parameters of the query or similiar cannot help.
Increasing the timeout will in this case probably only lead to a longer time until you get this error.
I would suggest you check if you need to use a proxy or if some firewall rules prevents TCP traffic to SalesForce.
At a first simple test you could start a ping from the server where your job runs to the SalesForce server.
Anonymous
Not applicable
Author

Thanks for the feedback, Jan.
Unfortunately, I think that this is not a causing my problems.
For one, I have been able to use the bulk query many many times, and it only fails on very large (3mm++ rows and 300+ column) objects.
Secondly, you can clearly see in the first screenshot, that data has actually been transferred.
Finally, I am also able to run the same query using the regular query, although it takes 20+ hours to complete.
Additionally, I am also able to monitor the inbound traffic, and can see data coming in from salesforce.
Thanks again!
0683p000009MBZ4.jpg
Anonymous
Not applicable
Author

I have also checked, and I can ping salesforce. I also do not need to change any firewall rules, or use a proxy.
Anonymous
Not applicable
Author

Hi, I have the same problem.
I was using Talend TOS DI  5.3 and upgraded to 5.6 hoping the Salesforce Bulk Query will work as I have a table with more than 100 k rows. Unfortunately it does not work because of 10 k limit .
Does anyone have suggestions? I really need to speed the SQL.
Thanks!