Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Unite two datasets Automatically by name

Hello all,

In Talend we use a tool called Tunite to union two or more data-sets. The problem with it is the schema of all the data-sets should be same, where the order of the columns and the number of columns should always be same. When handling a larger data-set, ordering the data-sets or creating dummy columns becomes difficult as there are many number of columns and many data-sets. 

 

Is there any way where i can just unite them automatically based on the column names? and if the column is missing in other data-set it should have nulls instead.

 

If it i a custom component , can anyone guide me towards the existing component or help me build one/

Labels (3)
1 Solution

Accepted Solutions
David_Beaty
Specialist
Specialist

1/ create table in DB that's the super-set of all columns in all sources (final table).

2/ create table in DB that's got X columns, where X is a number which is the largest number of columns you'd expect in any incoming data-set, each column will be varchar(8000), just call the columns col1, col2, col3.....colX (work table)

3/ load incoming data into DB work table

4/ parse the header row of the incoming data, to ascertain the position of all columns of interest - this will be critical that the same column data is called the same name in each of the incoming data-sets.

5/ construct SQL to INSERT INTO/SELECT from the work table into the final table.

6/ execute the SQL

7/clear down work table.

View solution in original post

5 Replies
manodwhb
Champion II
Champion II

@Aishwarya,since you need to do manually,there we do not have automatically arrange. 

David_Beaty
Specialist
Specialist

Hi,

 

A way I've tackled this in the past is load the unknown schema data-set up into a table of a suitably large enough number of string columns. Then using the header line of the incoming data, dynamically construct some SQL to select back out all columns, replacing them with nulls if they weren't in the source file.

 

It's not pretty, but it does work.

 

Thanks

 

David

Anonymous
Not applicable
Author

Can you please help me follow these steps? I am pretty new to Talend
David_Beaty
Specialist
Specialist

1/ create table in DB that's the super-set of all columns in all sources (final table).

2/ create table in DB that's got X columns, where X is a number which is the largest number of columns you'd expect in any incoming data-set, each column will be varchar(8000), just call the columns col1, col2, col3.....colX (work table)

3/ load incoming data into DB work table

4/ parse the header row of the incoming data, to ascertain the position of all columns of interest - this will be critical that the same column data is called the same name in each of the incoming data-sets.

5/ construct SQL to INSERT INTO/SELECT from the work table into the final table.

6/ execute the SQL

7/clear down work table.

Anonymous
Not applicable
Author

Thanks for the instruction. I tried do this but I did not succeed. Most likely, I just do not understand how to do it. For me, it's as hard as writing a term paper. I could not write a term paper because I do not know how to do it. I found writing help a term paper on the internet and helped me. Probably I will have to look for a programmer who will help me to do everything according to your instructions, so that the information is displayed correctly through the database.