Solved: Re: How does DataPrepRun in a Standard job works? - Qlik Community

dbeltritti · ‎2019-01-29

We've Talend Cloud Data Preparation and I created a recipe based on a dataset that I loaded as test file which is basically doing a regEx replace in a column.

Then I've a standard job that runs on a remote engine onprem that basically reads several files and applies the data preparation recipe using a tDataPrepRun component.

I tested performance using the tDataPrepRun component and a standard regEx tReplace component and the later is like 4 times faster than using the tDataPrepRun.

So I'm wondering how the Data Preparation recipe is applied? Does it downloads the recipe everytime from the cloud and apply it locally in the RunTime engine? or does it upload the data to Talend Cloud, applies the recipe and then downloads it back?

If anyone knows the details please let me know

Thanks!

Anonymous · ‎2019-01-30

Hi,

When it comes to tDataPrepRun, the following documentation page should answer your question: https://help.talend.com/reader/rGfDn9c_Qjv5~4P5XcYKbw/tClZKcGIQ9tfYAAOSeeg7w. Short version:

For DI jobs, the processing is performed on the Data Prep server
For Big Data jobs, the processing is performed on the cluster

We plan to align the DI behavior to the Big Data one, but there is no confirmed ETA yet. There is no difference between on-prem and Cloud, btw. Same principles apply.

As a side note, the runtime used when running a preparation directly from the UI is described here: https://help.talend.com/reader/94sQcluQTA3Bds1QWGANTw/49m3unRzMnXJnX7tIv79mA

Cheers,

Gwendal

View solution in original post

Anonymous · ‎2019-01-29

Hi,

I have raised the query to Product Team using JIRA Ticket and below is the link for your reference.

https://jira.talendforge.org/browse/TDP-6730

Regarding your last query, the data is never loaded to Talend Cloud as Talend Cloud is fetching only metadata information to control the job. All the other details will be processed directly from your remote engine.

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous · ‎2019-01-30

Hi,

When it comes to tDataPrepRun, the following documentation page should answer your question: https://help.talend.com/reader/rGfDn9c_Qjv5~4P5XcYKbw/tClZKcGIQ9tfYAAOSeeg7w. Short version:

For DI jobs, the processing is performed on the Data Prep server
For Big Data jobs, the processing is performed on the cluster

We plan to align the DI behavior to the Big Data one, but there is no confirmed ETA yet. There is no difference between on-prem and Cloud, btw. Same principles apply.

As a side note, the runtime used when running a preparation directly from the UI is described here: https://help.talend.com/reader/94sQcluQTA3Bds1QWGANTw/49m3unRzMnXJnX7tIv79mA

Cheers,

Gwendal

dbeltritti · ‎2019-01-30

Thanks for your reply!, just to be sure I got it right

if I use a tDataPrepRun component to use a recipe built on Data Preparation Cloud and the job using that component is being executed in an onPrem remote engine then for every flow of data it will be uploaded to Talend cloud, run the recipe and download the results.

Is that right?

If that's right then it would explain the performance difference with the local regEx

Thanks,

Damian

dbeltritti · ‎2019-01-30

Thanks for creating the ticket!, really appreciate it

Anonymous · ‎2019-01-30

Yes, that is correct. And yes, that fully explains the performance discrepancy ... and why we want to review the way DI jobs work with tDataPrepRun to mimic Big Data jobs.

Regards,

Gwendal

How does DataPrepRun in a Standard job works?

Data Prep

Data Quality

v7.x