Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All -
We have a cloud application and that contains the history data from 2011 and the data may be around 11 Lakhs records. Can we pull all the records using tRestClient component? Talend can handle huge data?
Your help is much appreciated !
Hi,
Below are my thoughts for your queries.
1) To handle bulk API what kind of component do we use to pull the data?
It depends on the output of Zoho Bulk component outputs. If the output of the Zoho Bulk is a csv file, you can read it using a tfileinputdelimited, if its JSON the corresponding file reading component etc.
Please refer the below items for the same.
https://www.zoho.com/crm/help/developer/api/overview.html
https://www.zoho.com/crm/help/developer/api/bulk-write/overview.html
2) Does Talend Open Studio can handle bulk data whether 1.1M or 5M ?
Talend Open Studio generates the Java code for processing data. If you have a good envt with system resources to handle this work load (with proper CPU, memory, network parameters etc.), you can use Talend Open Studio itself to process the data of this volume.
3) On the other note, you have mentioned " I would extract the data in multiple parallel chunks through Talend rather than running a single tRest to extract data in bulk mode ", can you tell me how to do this ?
It means you will have to do the filtering in the source side itself (here Zoho) to extract the data based on specified filter conditions. You will have to see the capabilities based on the APIs provided by the source provider as Talend do not have control here. Talend can just pass the parameters in this step. You can use high performance components like tparallelize to increase the parallelism but they are available only for Enterprise version of Talend only.
On a different note, if you would like to see how you can integrate Java SDKs with Talend, please refer the below example where I have used AWS SDK to do translate service. The theory remains same in your case except chnage in SDK and functions.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
Before attempting to pull 1.1 million records using a single tRest, I would suggest you to check the time out of cloud application, maximum throughput from source etc. I would extract the data in multiple parallel chunks through Talend rather than running a single tRest to extract data in bulk mode.
Please also verify whether there are any Bulk unloading APIs available for this Cloud application. Ideally these type of data extractions should happen through Bulk data extract mode.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Thanks for the reply @nthampi
As per you reply, Suppose if we have bulk APIs and
1) To handle bulk API what kind of component do we use to pull the data?
2) Does Talend Open Studio can handle bulk data whether 1.1M or 5M ?
3) On the other note, you have mentioned " I would extract the data in multiple parallel chunks through Talend rather than running a single tRest to extract data in bulk mode ", can you tell me how to do this ? or if you share any blog that would great !
Thanks in Advance.
Hi,
Below are my thoughts for your queries.
1) To handle bulk API what kind of component do we use to pull the data?
It depends on the output of Zoho Bulk component outputs. If the output of the Zoho Bulk is a csv file, you can read it using a tfileinputdelimited, if its JSON the corresponding file reading component etc.
Please refer the below items for the same.
https://www.zoho.com/crm/help/developer/api/overview.html
https://www.zoho.com/crm/help/developer/api/bulk-write/overview.html
2) Does Talend Open Studio can handle bulk data whether 1.1M or 5M ?
Talend Open Studio generates the Java code for processing data. If you have a good envt with system resources to handle this work load (with proper CPU, memory, network parameters etc.), you can use Talend Open Studio itself to process the data of this volume.
3) On the other note, you have mentioned " I would extract the data in multiple parallel chunks through Talend rather than running a single tRest to extract data in bulk mode ", can you tell me how to do this ?
It means you will have to do the filtering in the source side itself (here Zoho) to extract the data based on specified filter conditions. You will have to see the capabilities based on the APIs provided by the source provider as Talend do not have control here. Talend can just pass the parameters in this step. You can use high performance components like tparallelize to increase the parallelism but they are available only for Enterprise version of Talend only.
On a different note, if you would like to see how you can integrate Java SDKs with Talend, please refer the below example where I have used AWS SDK to do translate service. The theory remains same in your case except chnage in SDK and functions.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂