Multiple Inserts from different tables to one tabl... - Qlik Community

Anonymous · ‎2019-06-07

Hello,

Planning to migrate from Kettle to Talend Open Studio. So trying to do few POC's

Use case:

I got 5 source tables on the Greenplum database with different structures and have one target denormalized table with all columns of 5 source tables. What will be the better approach in Talend to load data from 5 source table to one target table with no JOINs(Just inserting data)

Does tMap works without Join?

As per my research

tDBInput1 . -------------

tDBInput2. -------------

tDBInput3. ------------- tMap ---------- tDBOutputBulkExce

tDBInput4. -------------

tDBInput5. -------------

Anonymous · ‎2019-06-07

Hi,

If your source tables are having different schema, first bring them to a unified schema using individual tMaps after tDBInput. Once the schema is in unified format, you can use a tUnite to unify all 5 different source flows and from there to Output.

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

nfz11 · ‎2019-06-08

How do you know which row to stitch together from each of the 5 data sources? Are you depending on sorting? Do all of the 5 data sources have the same number of rows?

One way you could do it is include rownum() in your query for each of the 5 inputs and join on rownum() in the tMap. Then you can map the output to a row containing the values of all 5 inputs. You can use tMap without join conditions but it will output the Cartesian product of the data the same way an SQL join would if you specify no key to join on in a where clause.

Anonymous · ‎2019-06-08

Thanks, nfx11.

Do all of the 5 data sources have the same number of rows?

No, Each source table has a different set of columns. Only a few are common

Thanks for your inputs. Currently, I'm trying to use tMap for each source and tUnite. will see how it goes.

nfz11 · ‎2019-06-08

@KarthikR wrote:

Thanks, nfx11.

Do all of the 5 data sources have the same number of rows?

No, Each source table has a different set of columns. Only a few are common

Thanks for your inputs. Currently, I'm trying to use tMap for each source and tUnite. will see how it goes.

I was asking about the number of rows in each data source, not the number of columns.

tUnite will not do what you want per the use case in the OP. tUnite is like doing a union in SQL and requires the schema of every input to be the same. If you are OK with having blanks in most of the columns and to have all the rows from an input source grouped together then I guess a tUnite will work.

For the use case in the OP, you could also use custom Java to store all the data to data structures in the globalMap and then manipulate it from there. If you have a large data set this may not be possible due to memory constraints.

Anonymous · ‎2019-06-09

oops, my bad.

The 1st table contains around 80M

2nd table - 25M and remaining tables are around 3-6M records.

tUnite will not do what you want per the use case in the OP. tUnite is like doing a union in SQL and requires the schema of every input to be the same. If you are OK with having blanks in most of the columns and to have all the rows from an input source grouped together then I guess a tUnite will work.

this might work as per the above statement. and my bad if I confused my initial question.

The denormalized table contains around 200 columns. So if I'm loading 1st table which contains 20 columns, only those particular columns will get populated remaining all will be NULL in the denormalized table. Eventually, i will be using that denormalized table to index data to Solr Cloud.

Multiple Inserts from different tables to one table (No Joins)

Talend Data Integration