Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We are extracting data from salesforce using talend. While extracting body field(which is in binary ) of attachment object of salesforce ,using tsalesforceinput ,the extraction process is very slow. We also tried using query mode as bulk in tsalesforceinput ,but it is not working as binary field is not supported in bulk mode.
Is there any other process to fasten the extraction process for binary field?
Can you give us more information on this? For example, do you know how big the attachment is? What is the speed of your internet? How many attachments are you downloading? How many records other than the binary representation of the attachment are you downloading? Is this from a live Salesforce environment (production) or a sandbox?
Currently we are extracting around 10,479 records in our development environment. Removing this Body field(binary) from the tSalesforceInput component is extracting the data in just 4 seconds. But with the Body field is taking a long time. In production (live Salesforce environment)data will be around 28 lakhs. There are seven other columns apart from Body which we are extracting.
The body field is essentially a file. You are downloading 10,479 files. That is going to take a fair amount of time and will heavily depend upon the size of the files. Even though you are downloading binary data, it is still going to be the same amount of actual data you would be downloading were you to download the data as files. There is no getting around this and this isn't a flaw in Salesforce or Talend. It is just the amount of data you are returning via essentially webservices.
Hi @Richard Hall ,
I am using tslaesforceinput to read data of about 2-5 Millions records. The Query Mode Bulk , BulkV2 option with TOS- BD v8 is not running properly.
below issues I am encountering using bulk option :
Will you please help me fix this issue. I want to accelerate the reading of data from salesforceInput.
Also, for above scenario when I am using Query Mode, jobs take 4-6 hours to process on a batch size of 800 -1200. My Windows VM Memory is 16Gb.
The Salesforce components make use of the Salesforce SOAP API. The more data being retrieved per record, the slower it will be. Do you you know how many bytes each record is? You need to calculate how many bytes in total you are downloading and then factor in your bandwidth, etc. My advice would be to only retrieve what you need. If you need everything, then can you retrieve the data over a longer period of time or in an incremental way? Unfortunately no application can overcome the bottleneck of bandwidth when downloading a large amount of data.
Salesforce data is loaded in append mode. My usecase need to setup an incremental load at my side while loading into my target db ! I fear doing an incremental will increase the process overhead and will increase the total job run time.
Have you calculated the the size of the data you are downloading? SOAP services are not as quick as local db to local db. This has to be considered before you can say that the throughput is slow.