Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
ALERT: The support homepage carousel is not displaying. We are working toward a resolution.

Using Data Preparation as part of a Data Ingestion Pipeline

100% helpful (1/1)
cancel
Showing results for 
Search instead for 
Did you mean: 
TalendSolutionExpert
Contributor II
Contributor II

Using Data Preparation as part of a Data Ingestion Pipeline

Last Update:

Jan 22, 2024 9:35:30 PM

Updated By:

Jamie_Gregory

Created date:

Apr 1, 2021 6:01:00 AM

Introduction

This article describes how to use Talend Data Preparation (Data Prep) as part of a data ingestion pipeline. This is a common business use case in which data from various disparate systems is required to be translated to a common format, standardized, reformatted, validated, and then passed in that standard format to other systems.

 

This example shows how you can build a Talend Job to ingest data from a database, convert it to a common format, run a Talend Data preparation against that data, and then pass it on to other systems, in this case exporting as both a Comma Separated Value (CSV) and an MS Excel file. Any of these systems can, of course, be replaced by any number of things, but what will be common will be translating the data to a common format and running a data preparation.

 

Building the Talend Job

The Job you will build is shown below:

0693p000008u7eiAAA.png

 

Each component will be dealt with separately.

 

Input the Data

In this example, you will input data from a MySQL database. You set up a database table in MySQL using some example data; a section of the data is shown below. This shows the first part of your use case, getting data from a system in a specific format. You could use any number of different systems, but this example uses a database.

 

0693p000008u7enAAA.png

 

To represent that data in Talend you need to define a schema, as shown below:

0693p000008u7esAAA.png

 

You will then use the tMysqlInput component to get the data from this table. The component configuration is shown below:

0693p000008u7e0AAA.png

 

Transform the Data

You now need to transform the data into the common format you are using in your use case. This common format is shown below, and contains the following columns:

0693p000008u7exAAA.png

 

This is your common format. In this use case, you want to translate ALL incoming data into this format. To do so, use a tMap component and map the fields as shown below:

0693p000008u7dgAAA.png

 

Run a Data Preparation

The next step is to use a Data Preparation to run against the data. To do so, use the tDataprepRun component. On the component configuration tab, you can either use an existing preparation, or create a new one. In this example, you will create a new one.

  1. On the Component tab, click Create a new one, then select the Edit preparation in your browser icon.

    0693p000008u7f2AAA.png

     

  2. The Data Preparation home page window opens in your web browser and the Data Preparation you just created is displayed:

     

    0693p000008u7bzAAA.png

     

  3. Select the Data Preparation to edit it. You can choose any number of things to do to the data; this example's use case is shown below:

    0693p000008u7f7AAA.png

     

    This example shows selecting a few simple functions for a small number of columns in the data. You may choose to follow this example or choose your own.

 

Manage the Output

In this example, you will send the data to two different places and formats. You will use the tReplicate component to send multiple copies of the output to one comma separated value file and one MS Excel spreadsheet. This is shown below:

0693p000008u7fCAAQ.png

 

Configure the components like this:

  1. Replicate the input/output schema in the tReplicate Component:

    0693p000008u7fHAAQ.png

     

  2. Configure the final output components like this:

    • For the tFileOutputExcel component:

      0693p000008u7fMAAQ.png

       

    • For the tFileOutputDelimited component:

      0693p000008u7fRAAQ.png

       

 

Run the Job

The Job is now built and configured. The next step is to run it, as shown below:

0693p000008u7c0AAA.png

 

Once run, you should get output similar to this:

0693p000008u7fWAAQ.png

 

Examine the Output

In this example, you have a dataset containing 6040 records. You can now examine the output files, either by opening them or using the Talend Data Viewer:

0693p000008u7fbAAA.png

 

Comparing this to the input data, you can see that the data has indeed been transformed by running through a Data Preparation.

0693p000008u7fgAAA.png

 

Labels (1)
Comments
MoeE
Partner - Specialist
Partner - Specialist

Hi,

Can this also be used in a Talend Cloud Pipeline?

Regards,

Mohammed

Xiaodi_Shi
Employee
Employee

Hello @MoeE 

As tDataprepRun component in Talend studio, is talend studio in your use case?

Here are data prep export options we have, so far.

https://help.qlik.com/talend/en-US/data-preparation-user-guide/Cloud/list-of-supported-connectors

In addition to that, we have "cross-dataset" function today and it is the lookup function, which matches data from the current preparation with its counterpart in a reference dataset, allowing you to add this reference data to your preparation.

https://help.qlik.com/talend/en-US/data-preparation-user-guide/Cloud/use-lookup

Best regards

Sabrina

 

Version history
Last update:
‎2024-01-22 09:35 PM
Updated by: