
Contributor
2012-10-08
11:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[resolved] Salesforce Bulk Queries for Large data sets
I want to query using the Salesforce Bulk API all the contacts in my org. The largest batch size is 10K, I would have expected Talend to send multiple batches. How can I achieve this.
1,405 Views
- « Previous Replies
-
- 1
- 2
- Next Replies »
1 Solution
Accepted Solutions

Anonymous
Not applicable
2014-11-12
01:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I still have this problem where talend DI will only send one batch request.
I'm trying to track down where it should write the logic for batch files, but I can only find it for the upload in the SalesforceBulkAPI.java:
I'm trying to track down where it should write the logic for batch files, but I can only find it for the upload in the SalesforceBulkAPI.java:
private List<BatchInfo> createBatchesFromCSVFile() throws IOException, AsyncApiException {
List<BatchInfo> batchInfos = new ArrayList<BatchInfo>();
BufferedReader rdr = new BufferedReader(new InputStreamReader(new FileInputStream(bulkFileName), FILE_ENCODING));
// read the CSV header row
byte[] headerBytes = (rdr.readLine() + "\n").getBytes("UTF-8");
int headerBytesLength = headerBytes.length;
File tmpFile = File.createTempFile("sforceBulkAPI", ".csv");
// Split the CSV file into multiple batches
try {
FileOutputStream tmpOut = new FileOutputStream(tmpFile);
int currentBytes = 0;
int currentLines = 0;
String nextLine;
boolean needStart = true;
boolean needEnds = true;
while ((nextLine = rdr.readLine()) != null) {
int num = countQuotes(nextLine);
// nextLine is header or footer of the record
if (num % 2 == 1) {
if (!needStart) {
needEnds = false;
} else {
needStart = false;
}
} else {
// nextLine is a whole record or middle of the record
if (needEnds && needStart) {
needEnds = false;
needStart = false;
}
}
byte[] bytes = (nextLine + "\n").getBytes("UTF-8");
// Create a new batch when our batch size limit is reached
if (currentBytes + bytes.length > maxBytesPerBatch || currentLines > maxRowsPerBatch) {
createBatch(tmpOut, tmpFile, batchInfos);
currentBytes = 0;
currentLines = 0;
}
if (currentBytes == 0) {
tmpOut = new FileOutputStream(tmpFile);
tmpOut.write(headerBytes);
currentBytes = headerBytesLength;
currentLines = 1;
}
tmpOut.write(bytes);
currentBytes += bytes.length;
if (!needStart && !needEnds) {
currentLines++;
needStart = true;
needEnds = true;
}
}
// Finished processing all rows
// Create a final batch for any remaining data
if (currentLines > 1) {
createBatch(tmpOut, tmpFile, batchInfos);
}
} finally {
tmpFile.delete();
}
return batchInfos;
}
1,405 Views
12 Replies

Contributor
2012-10-09
10:39 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to the same ability you get on the BulkExec import. I want to be able to query for all 7M contacts in one go and have salesforce output the file back to Talend.
1,206 Views

Anonymous
Not applicable
2012-10-11
02:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
I couldn't find the 'query' action for the salesforceBulkExec component, so I ends up with the same question as you raised.
Did you found any work-around for this?
Thanks
I couldn't find the 'query' action for the salesforceBulkExec component, so I ends up with the same question as you raised.
Did you found any work-around for this?
Thanks
1,206 Views

Contributor
2012-10-11
06:54 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have to use tSalesforceConnection + tSalesforceInput component. Configure these two to work together making sure you select "For salesforce bulk component" inside the tSalesforceConnection component; then output to CSV as the last step.
The issue I have is this configuration only allows you to export out up to the Salesforce maximum batch size of 10,000.
I was hoping that the component would create (n) of batches if required automatically, but it looks like I have to create that logic. Waiting on talend.
The issue I have is this configuration only allows you to export out up to the Salesforce maximum batch size of 10,000.
I was hoping that the component would create (n) of batches if required automatically, but it looks like I have to create that logic. Waiting on talend.
1,206 Views

Anonymous
Not applicable
2012-10-12
12:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks a lot for the reply, just found that my Talend version 5.1 was outdated as the bulk query only available in 5.2.x
Referring to this guideline: http://www.salesforce.com/us/developer/docs/api_asynch/index.htm
- Bulk query would has limitation max. 10 files, each max. size 1 gig
- If the query needs to return more than 10 files, the query should be filtered to return less data. Bulk batch sizes are not used for bulk queries.
So, I am a bit confuse with the 10,000 limitation. I would think this is the batch size limit to upload data to Salesforce, but not the other way round as in the bulk query
Referring to this guideline: http://www.salesforce.com/us/developer/docs/api_asynch/index.htm
- Bulk query would has limitation max. 10 files, each max. size 1 gig
- If the query needs to return more than 10 files, the query should be filtered to return less data. Bulk batch sizes are not used for bulk queries.
So, I am a bit confuse with the 10,000 limitation. I would think this is the batch size limit to upload data to Salesforce, but not the other way round as in the bulk query
1,206 Views

Contributor
2012-10-12
10:35 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replying. I understand the limitations of the Salesforce API, but still struggling with configuring Talend to query large data (millions of records) volumes using Bulk. Non bulk is not an option.
If I fire the same query using Jitterbit/workbench then these apps simply create additional batches automatically and return the records in the query. In my case 7M+ records are returned in a single file within 10-15 minutes.
Similarly if I am importing using the Bulk API talend just keeps sending batches with 10K of records until it hits the limit of 2000 in 24 hrs.
Any ideas?
If I fire the same query using Jitterbit/workbench then these apps simply create additional batches automatically and return the records in the query. In my case 7M+ records are returned in a single file within 10-15 minutes.
Similarly if I am importing using the Bulk API talend just keeps sending batches with 10K of records until it hits the limit of 2000 in 24 hrs.
Any ideas?
1,206 Views

Anonymous
Not applicable
2012-10-15
05:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks a lot of pointing out the limitation was something 'implemented' by Talend instead of Salesforce, I am clear as what you means now.
I think, the existing Talend that with batches with 10K may never meet your requirement to get 7M+ of records.
Is it possible to filter your data by certain criteria, for e.g. lastModifiedDate ?
I am working on a flow something like this:
get lastSynDate in local DB -> add this as filter criteria in the query condition -> populate data -> update lastSynDate in local DB.
Not too sure if this flow applicable for your scenario ...
Again, thanks a lot for your info.
Thanks a lot of pointing out the limitation was something 'implemented' by Talend instead of Salesforce, I am clear as what you means now.
I think, the existing Talend that with batches with 10K may never meet your requirement to get 7M+ of records.
Is it possible to filter your data by certain criteria, for e.g. lastModifiedDate ?
I am working on a flow something like this:
get lastSynDate in local DB -> add this as filter criteria in the query condition -> populate data -> update lastSynDate in local DB.
Not too sure if this flow applicable for your scenario ...
Again, thanks a lot for your info.
1,206 Views

Contributor
2012-10-15
06:27 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's one solution. Thanks for sharing.
I still feel this should be natively handled by the Talend component. Will raise an an enhancement request.
I am planning on building a process incorporates first bulk exporting using Workbench and then using Talend to process the file. It's not critical process, only something which we run each time we refresh a sandbox so it doesn't really matter if it has a few more steps.
I still feel this should be natively handled by the Talend component. Will raise an an enhancement request.
I am planning on building a process incorporates first bulk exporting using Workbench and then using Talend to process the file. It's not critical process, only something which we run each time we refresh a sandbox so it doesn't really matter if it has a few more steps.
1,206 Views

Contributor
2013-06-10
11:45 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm now working with another large salesforce.com client and the latest version of Talend 5.3.0r101800
Same as before we are attempting to perform large bulk queries and hitting 10K record limit. I would expect Talend to be able to spool multiple batches out to CSV. The full data set is being queried (see attachment), but Talend cannot process the output.
Any advice / pointers
Same as before we are attempting to perform large bulk queries and hitting 10K record limit. I would expect Talend to be able to spool multiple batches out to CSV. The full data set is being queried (see attachment), but Talend cannot process the output.
Any advice / pointers
1,206 Views

Anonymous
Not applicable
2013-06-17
10:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi agentgill
I seen there have been a same issue report on jira
https://jira.talendforge.org/browse/TDI-7779
and it had been fixed since v4.0.4. Don't know tSalesforceBulkExec have been updated in the latest version, but I would like to suggest you to download v4.1.0, which is closer to v4.0.4, and test if it works on v4.1.0. You can download v4.1.0 from this link:
http://sourceforge.net/projects/talend-studio/files/Talend%20Open%20Studio/4.1.0/
Shong
I seen there have been a same issue report on jira
https://jira.talendforge.org/browse/TDI-7779
and it had been fixed since v4.0.4. Don't know tSalesforceBulkExec have been updated in the latest version, but I would like to suggest you to download v4.1.0, which is closer to v4.0.4, and test if it works on v4.1.0. You can download v4.1.0 from this link:
http://sourceforge.net/projects/talend-studio/files/Talend%20Open%20Studio/4.1.0/
Shong
1,206 Views

- « Previous Replies
-
- 1
- 2
- Next Replies »