<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: [resolved] Salesforce Bulk Queries for Large data sets in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220625#M14965</link>
    <description>This problem may still appear in the latest version 5.3.0, i suggest you to reopen TDI-7779 or create a new issue in our bugtracker, so that our developers from R&amp;amp;D team can see it.
&lt;BR /&gt;Shong</description>
    <pubDate>Wed, 19 Jun 2013 08:16:50 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2013-06-19T08:16:50Z</dc:date>
    <item>
      <title>[resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220614#M14954</link>
      <description>I want to query using the Salesforce Bulk API all the contacts in my org. The largest batch size is 10K, I would have expected Talend to send multiple batches. How can I achieve this.</description>
      <pubDate>Sat, 16 Nov 2024 12:10:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220614#M14954</guid>
      <dc:creator>agentgill</dc:creator>
      <dc:date>2024-11-16T12:10:23Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220615#M14955</link>
      <description>I want to the same ability you get on the BulkExec import. I want to be able to query for all 7M contacts in one go and have salesforce output the file back to Talend.</description>
      <pubDate>Tue, 09 Oct 2012 14:39:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220615#M14955</guid>
      <dc:creator>agentgill</dc:creator>
      <dc:date>2012-10-09T14:39:47Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220616#M14956</link>
      <description>Hi , &lt;BR /&gt;I couldn't find the 'query' action for the salesforceBulkExec component, so I ends up with the same question as you raised.&lt;BR /&gt;Did you found any work-around for this?&lt;BR /&gt;Thanks</description>
      <pubDate>Thu, 11 Oct 2012 06:57:08 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220616#M14956</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-10-11T06:57:08Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220617#M14957</link>
      <description>You have to use tSalesforceConnection + tSalesforceInput component. Configure these two to work together making sure you select "For salesforce bulk component" inside the tSalesforceConnection component; then output to CSV as the last step. &lt;BR /&gt;The issue I have is this configuration only allows you to export out up to the Salesforce maximum batch size of 10,000. &lt;BR /&gt;I was hoping that the component would create (n) of batches if required automatically, but it looks like I have to create that logic. Waiting on talend.</description>
      <pubDate>Thu, 11 Oct 2012 10:54:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220617#M14957</guid>
      <dc:creator>agentgill</dc:creator>
      <dc:date>2012-10-11T10:54:43Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220618#M14958</link>
      <description>Hi, thanks a lot for the reply, just found that my Talend version 5.1 was outdated as the bulk query only available in 5.2.x
&lt;BR /&gt;Referring to this guideline: 
&lt;A href="http://www.salesforce.com/us/developer/docs/api_asynch/index.htm" rel="nofollow noopener noreferrer"&gt;http://www.salesforce.com/us/developer/docs/api_asynch/index.htm&lt;/A&gt;
&lt;BR /&gt;- Bulk query would has limitation max. 10 files, each max. size 1 gig
&lt;BR /&gt;- If the query needs to return more than 10 files, the query should be filtered to return less data. Bulk batch sizes are not used for bulk queries.
&lt;BR /&gt;So, I am a bit confuse with the 10,000 limitation. I would think this is the batch size limit to upload data to Salesforce, but not the other way round as in the bulk query</description>
      <pubDate>Fri, 12 Oct 2012 04:16:11 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220618#M14958</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-10-12T04:16:11Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220619#M14959</link>
      <description>Thanks for replying. I understand the limitations of the Salesforce API, but still struggling with configuring Talend to query large data (millions of records) volumes using Bulk. Non bulk is not an option.
&lt;BR /&gt;If I fire the same query using Jitterbit/workbench then these apps simply create additional batches automatically and return the records in the query. In my case 7M+ records are returned in a single file within 10-15 minutes.
&lt;BR /&gt;Similarly if I am importing using the Bulk API talend just keeps sending batches with 10K of records until it hits the limit of 2000 in 24 hrs. 
&lt;BR /&gt;Any ideas?</description>
      <pubDate>Fri, 12 Oct 2012 14:35:35 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220619#M14959</guid>
      <dc:creator>agentgill</dc:creator>
      <dc:date>2012-10-12T14:35:35Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220620#M14960</link>
      <description>Hi, 
&lt;BR /&gt;Thanks a lot of pointing out the limitation was something 'implemented' by Talend instead of Salesforce, I am clear as what you means now. 
&lt;BR /&gt;I think, the existing Talend that with batches with 10K may never meet your requirement to get 7M+ of records. 
&lt;BR /&gt;Is it possible to filter your data by certain criteria, for e.g. lastModifiedDate ? 
&lt;BR /&gt;I am working on a flow something like this: 
&lt;BR /&gt;get lastSynDate in local DB -&amp;gt; add this as filter criteria in the query condition -&amp;gt; populate data -&amp;gt; update lastSynDate in local DB. 
&lt;BR /&gt;Not too sure if this flow applicable for your scenario ... 
&lt;BR /&gt;Again, thanks a lot for your info.</description>
      <pubDate>Mon, 15 Oct 2012 09:53:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220620#M14960</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-10-15T09:53:15Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220621#M14961</link>
      <description>That's one solution. Thanks for sharing. 
&lt;BR /&gt;I still feel this should be natively handled by the Talend component. Will raise an an enhancement request. 
&lt;BR /&gt;I am planning on building a process incorporates first bulk exporting using Workbench and then using Talend to process the file. It's not critical process, only something which we run each time we refresh a sandbox so it doesn't really matter if it has a few more steps.</description>
      <pubDate>Mon, 15 Oct 2012 10:27:58 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220621#M14961</guid>
      <dc:creator>agentgill</dc:creator>
      <dc:date>2012-10-15T10:27:58Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220622#M14962</link>
      <description>I'm now working with another large salesforce.com client and the latest version of Talend 5.3.0r101800 
&lt;BR /&gt;Same as before we are attempting to perform large bulk queries and hitting 10K record limit. I would expect Talend to be able to spool multiple batches out to CSV. The full data set is being queried (see attachment), but Talend cannot process the output. 
&lt;BR /&gt;Any advice / pointers 
&lt;BR /&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MCuq.jpg"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/149405i524ADBEDE147DB4F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MCuq.jpg" alt="0683p000009MCuq.jpg" /&gt;&lt;/span&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MCuv.jpg"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/129216i2C6D5816DE89A32F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MCuv.jpg" alt="0683p000009MCuv.jpg" /&gt;&lt;/span&gt;</description>
      <pubDate>Mon, 10 Jun 2013 15:45:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220622#M14962</guid>
      <dc:creator>agentgill</dc:creator>
      <dc:date>2013-06-10T15:45:00Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220623#M14963</link>
      <description>Hi agentgill 
&lt;BR /&gt;I seen there have been a same issue report on jira 
&lt;BR /&gt; 
&lt;A href="https://jira.talendforge.org/browse/TDI-7779" rel="nofollow noopener noreferrer"&gt;https://jira.talendforge.org/browse/TDI-7779&lt;/A&gt; 
&lt;BR /&gt;and it had been fixed since v4.0.4. Don't know tSalesforceBulkExec have been updated in the latest version, but I would like to suggest you to download v4.1.0, which is closer to v4.0.4, and test if it works on v4.1.0. You can download v4.1.0 from this link: 
&lt;BR /&gt; 
&lt;A href="http://sourceforge.net/projects/talend-studio/files/Talend%20Open%20Studio/4.1.0/" rel="nofollow noopener noreferrer"&gt;http://sourceforge.net/projects/talend-studio/files/Talend%20Open%20Studio/4.1.0/&lt;/A&gt; 
&lt;BR /&gt;Shong</description>
      <pubDate>Mon, 17 Jun 2013 14:10:54 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220623#M14963</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-06-17T14:10:54Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220624#M14964</link>
      <description>Thanks for response. This is a really old version. I'm using version 5.3.0 and also uploading large batches is working fine. My issue is with bulk querying large recordsets in bulk. Any ideas on that?</description>
      <pubDate>Mon, 17 Jun 2013 14:37:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220624#M14964</guid>
      <dc:creator>agentgill</dc:creator>
      <dc:date>2013-06-17T14:37:19Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220625#M14965</link>
      <description>This problem may still appear in the latest version 5.3.0, i suggest you to reopen TDI-7779 or create a new issue in our bugtracker, so that our developers from R&amp;amp;D team can see it.
&lt;BR /&gt;Shong</description>
      <pubDate>Wed, 19 Jun 2013 08:16:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220625#M14965</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-06-19T08:16:50Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Salesforce Bulk Queries for Large data sets</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220626#M14966</link>
      <description>I still have this problem where talend DI will only send one batch request. 
&lt;BR /&gt;I'm trying to track down where it should write the logic for batch files, but I can only find it for the upload in the SalesforceBulkAPI.java: 
&lt;BR /&gt; 
&lt;PRE&gt;    private List&amp;lt;BatchInfo&amp;gt; createBatchesFromCSVFile() throws IOException, AsyncApiException {&lt;BR /&gt;        List&amp;lt;BatchInfo&amp;gt; batchInfos = new ArrayList&amp;lt;BatchInfo&amp;gt;();&lt;BR /&gt;        BufferedReader rdr = new BufferedReader(new InputStreamReader(new FileInputStream(bulkFileName), FILE_ENCODING));&lt;BR /&gt;        // read the CSV header row&lt;BR /&gt;        byte[] headerBytes = (rdr.readLine() + "\n").getBytes("UTF-8");&lt;BR /&gt;        int headerBytesLength = headerBytes.length;&lt;BR /&gt;        File tmpFile = File.createTempFile("sforceBulkAPI", ".csv");&lt;BR /&gt;        // Split the CSV file into multiple batches&lt;BR /&gt;        try {&lt;BR /&gt;            FileOutputStream tmpOut = new FileOutputStream(tmpFile);&lt;BR /&gt;            int currentBytes = 0;&lt;BR /&gt;            int currentLines = 0;&lt;BR /&gt;            String nextLine;&lt;BR /&gt;            boolean needStart = true;&lt;BR /&gt;            boolean needEnds = true;&lt;BR /&gt;            while ((nextLine = rdr.readLine()) != null) {&lt;BR /&gt;                int num = countQuotes(nextLine);&lt;BR /&gt;                // nextLine is header or footer of the record&lt;BR /&gt;                if (num % 2 == 1) {&lt;BR /&gt;                    if (!needStart) {&lt;BR /&gt;                        needEnds = false;&lt;BR /&gt;                    } else {&lt;BR /&gt;                        needStart = false;&lt;BR /&gt;                    }&lt;BR /&gt;                } else {&lt;BR /&gt;                    // nextLine is a whole record or middle of the record&lt;BR /&gt;                    if (needEnds &amp;amp;&amp;amp; needStart) {&lt;BR /&gt;                        needEnds = false;&lt;BR /&gt;                        needStart = false;&lt;BR /&gt;                    }&lt;BR /&gt;                }&lt;BR /&gt;                byte[] bytes = (nextLine + "\n").getBytes("UTF-8");&lt;BR /&gt;                // Create a new batch when our batch size limit is reached&lt;BR /&gt;                if (currentBytes + bytes.length &amp;gt; maxBytesPerBatch || currentLines &amp;gt; maxRowsPerBatch) {&lt;BR /&gt;                    createBatch(tmpOut, tmpFile, batchInfos);&lt;BR /&gt;                    currentBytes = 0;&lt;BR /&gt;                    currentLines = 0;&lt;BR /&gt;                }&lt;BR /&gt;                if (currentBytes == 0) {&lt;BR /&gt;                    tmpOut = new FileOutputStream(tmpFile);&lt;BR /&gt;                    tmpOut.write(headerBytes);&lt;BR /&gt;                    currentBytes = headerBytesLength;&lt;BR /&gt;                    currentLines = 1;&lt;BR /&gt;                }&lt;BR /&gt;                tmpOut.write(bytes);&lt;BR /&gt;                currentBytes += bytes.length;&lt;BR /&gt;                if (!needStart &amp;amp;&amp;amp; !needEnds) {&lt;BR /&gt;                    currentLines++;&lt;BR /&gt;                    needStart = true;&lt;BR /&gt;                    needEnds = true;&lt;BR /&gt;                }&lt;BR /&gt;            }&lt;BR /&gt;            // Finished processing all rows&lt;BR /&gt;            // Create a final batch for any remaining data&lt;BR /&gt;            if (currentLines &amp;gt; 1) {&lt;BR /&gt;                createBatch(tmpOut, tmpFile, batchInfos);&lt;BR /&gt;            }&lt;BR /&gt;        } finally {&lt;BR /&gt;            tmpFile.delete();&lt;BR /&gt;        }&lt;BR /&gt;        return batchInfos;&lt;BR /&gt;    }&lt;/PRE&gt;</description>
      <pubDate>Wed, 12 Nov 2014 18:44:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Salesforce-Bulk-Queries-for-Large-data-sets/m-p/2220626#M14966</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-11-12T18:44:53Z</dc:date>
    </item>
  </channel>
</rss>

