Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
dbeltritti
Contributor III
Contributor III

How does DataPrepRun in a Standard job works?

We've Talend Cloud Data Preparation and I created a recipe based on a dataset that I loaded as test file which is basically doing a regEx replace in a column.

Then I've a standard job that runs on a remote engine onprem that basically reads several files and applies the data preparation recipe using a tDataPrepRun component. 

I tested performance using the tDataPrepRun component and a standard regEx tReplace component and the later is like 4 times faster than using the tDataPrepRun. 

 

So I'm wondering how the Data Preparation recipe is applied? Does it downloads the recipe everytime from the cloud and apply it locally in the RunTime engine? or does it upload the data to Talend Cloud, applies the recipe and then downloads it back?

 

If anyone knows the details please let me know

 

 

Thanks!

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hi,

 

When it comes to tDataPrepRun, the following documentation page should answer your question: https://help.talend.com/reader/rGfDn9c_Qjv5~4P5XcYKbw/tClZKcGIQ9tfYAAOSeeg7w. Short version:

  • For DI jobs, the processing is performed on the Data Prep server
  • For Big Data jobs, the processing is performed on the cluster

We plan to align the DI behavior to the Big Data one, but there is no confirmed ETA yet. There is no difference between on-prem and Cloud, btw. Same principles apply.

 

As a side note, the runtime used when running a preparation directly from the UI is described here: https://help.talend.com/reader/94sQcluQTA3Bds1QWGANTw/49m3unRzMnXJnX7tIv79mA

 

Cheers,

 

Gwendal

View solution in original post

5 Replies
Anonymous
Not applicable

Hi,

 

     I have raised the query to Product Team using JIRA Ticket and below is the link for your reference.

 

https://jira.talendforge.org/browse/TDP-6730

 

     Regarding your last query, the data is never loaded to Talend Cloud as Talend Cloud is fetching only metadata information to control the job. All the other details will be processed directly from your remote engine.


Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

 

Anonymous
Not applicable

Hi,

 

When it comes to tDataPrepRun, the following documentation page should answer your question: https://help.talend.com/reader/rGfDn9c_Qjv5~4P5XcYKbw/tClZKcGIQ9tfYAAOSeeg7w. Short version:

  • For DI jobs, the processing is performed on the Data Prep server
  • For Big Data jobs, the processing is performed on the cluster

We plan to align the DI behavior to the Big Data one, but there is no confirmed ETA yet. There is no difference between on-prem and Cloud, btw. Same principles apply.

 

As a side note, the runtime used when running a preparation directly from the UI is described here: https://help.talend.com/reader/94sQcluQTA3Bds1QWGANTw/49m3unRzMnXJnX7tIv79mA

 

Cheers,

 

Gwendal

dbeltritti
Contributor III
Contributor III
Author

Thanks for your reply!, just to be sure I got it right

if I use a tDataPrepRun component to use a recipe built on Data Preparation Cloud and the job using that component is being executed in an onPrem remote engine then for every flow of data it will be uploaded to Talend cloud, run the recipe and download the results.

Is that right?

 

If that's right then it would explain the performance difference with the local regEx

 

 

Thanks,

Damian

dbeltritti
Contributor III
Contributor III
Author

Thanks for creating the ticket!, really appreciate it
Anonymous
Not applicable

Yes, that is correct. And yes, that fully explains the performance discrepancy ... and why we want to review the way DI jobs work with tDataPrepRun to mimic Big Data jobs.

 

Regards,

 

Gwendal