Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We've Talend Cloud Data Preparation and I created a recipe based on a dataset that I loaded as test file which is basically doing a regEx replace in a column.
Then I've a standard job that runs on a remote engine onprem that basically reads several files and applies the data preparation recipe using a tDataPrepRun component.
I tested performance using the tDataPrepRun component and a standard regEx tReplace component and the later is like 4 times faster than using the tDataPrepRun.
So I'm wondering how the Data Preparation recipe is applied? Does it downloads the recipe everytime from the cloud and apply it locally in the RunTime engine? or does it upload the data to Talend Cloud, applies the recipe and then downloads it back?
If anyone knows the details please let me know
Thanks!
Hi,
When it comes to tDataPrepRun, the following documentation page should answer your question: https://help.talend.com/reader/rGfDn9c_Qjv5~4P5XcYKbw/tClZKcGIQ9tfYAAOSeeg7w. Short version:
We plan to align the DI behavior to the Big Data one, but there is no confirmed ETA yet. There is no difference between on-prem and Cloud, btw. Same principles apply.
As a side note, the runtime used when running a preparation directly from the UI is described here: https://help.talend.com/reader/94sQcluQTA3Bds1QWGANTw/49m3unRzMnXJnX7tIv79mA
Cheers,
Gwendal
Hi,
I have raised the query to Product Team using JIRA Ticket and below is the link for your reference.
https://jira.talendforge.org/browse/TDP-6730
Regarding your last query, the data is never loaded to Talend Cloud as Talend Cloud is fetching only metadata information to control the job. All the other details will be processed directly from your remote engine.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
When it comes to tDataPrepRun, the following documentation page should answer your question: https://help.talend.com/reader/rGfDn9c_Qjv5~4P5XcYKbw/tClZKcGIQ9tfYAAOSeeg7w. Short version:
We plan to align the DI behavior to the Big Data one, but there is no confirmed ETA yet. There is no difference between on-prem and Cloud, btw. Same principles apply.
As a side note, the runtime used when running a preparation directly from the UI is described here: https://help.talend.com/reader/94sQcluQTA3Bds1QWGANTw/49m3unRzMnXJnX7tIv79mA
Cheers,
Gwendal
Thanks for your reply!, just to be sure I got it right
if I use a tDataPrepRun component to use a recipe built on Data Preparation Cloud and the job using that component is being executed in an onPrem remote engine then for every flow of data it will be uploaded to Talend cloud, run the recipe and download the results.
Is that right?
If that's right then it would explain the performance difference with the local regEx
Thanks,
Damian
Yes, that is correct. And yes, that fully explains the performance discrepancy ... and why we want to review the way DI jobs work with tDataPrepRun to mimic Big Data jobs.
Regards,
Gwendal