Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Tina3
Contributor
Contributor

Identify new columns in excel file & load into redshift db

Hi all,

i have a scenario where I am trying to load all the columns from excel to redshift & delete the excel at the end of job

excel file gets generated everyday & sometimes new columns gets added in b/w the excel file.

For instance : excel has three columns and fourth column gets added in b/w Id & address

 

name Id address

moderna 6 UK

 

how do you I identify these new columns & either load all the columns including the new columns into redshift or Notify in an email

my job flow - tfileexcelinput -> tmap -> tdboutput

9 Replies
Prakhar1
Creator III
Creator III

you can use "Dynamic" Schema if you are using licensed version of Talend.

Tina3
Contributor
Contributor
Author

Thank you prakhar.

We use a licensed version.can you guide me through the job flow or any documentation.

Prakhar1
Creator III
Creator III

https://community.talend.com/s/article/Which-components-provide-the-Dynamic-Schema-feature-TBe8P#:~:text=Dynamic%20Schema%20is%20an%20advanced,the%20data%20to%20be%20handled

 

You can refer above link.

Let me know for further support

Tina3
Contributor
Contributor
Author

Is tfileexcelinput is supported by dynamic schema?.I don’t see that in the list

JohnRMK
Creator II
Creator II

Hello,

The dynamic schema is a very powerful feature which allows you to have the name of the columns on the fly without prior definition.

 

But, if I understood your use case correctly, it cannot be used directly because the data is inserted into an existing table so it will have to be created at each execution and you will lose the already existing data. if not then you stopped reading

 

if you already have a table in your database and you insert a new file with new columns, then here is an approach:

- Retrieve the columns of your target table (information_schema or a similar schema). Store this in memory ==> tHashOutput

- Retrieve the columns of your Excel or CSV file. either from external metadata or by reading for example the file with a limit 1 and header 0. So, you will have a line with the name of the columns.

you can use a tExtractdynamicFields component with java code to extract column names too.

 

Once you have both, capitalize everything, tMap two tHashInput and you get the join reject. ==> Extra columns in your file. you recep these columns, you concatenate them with strings like "ALTER TABLE + var_global_table_name + ADD COLUMN" columnName ". then you execute that in a tDBRow and after you can get the data and insert it

Prakhar1
Creator III
Creator III

Sure it does, I use it in my jobs.

Tina3
Contributor
Contributor
Author

Thank you John for the response.

 

i was able to successfully retrieve the dynamic schema from excel using tfileexcelinput & load into tfileoutputdelimited file.

 

i have to load these dynamic columns to redshift & I don’t have to store previous data or table schema of redshift.i can drop and recreate the table if exists

 

how do I map these dynamically generated tfileoutputdelimited file to tmap and tdboutput to retrieve the latest metdata.below is my workflow

 

tfileexcelinput (load only dynamic columns from a excel sheet) ——> tfileoutputdelimited —> on component ok —> tfileinputdelimited —> tmap ——> tdboutput

Tina3
Contributor
Contributor
Author

Thank you prakhar.

 

You are right.Currently i am reading all the current columns as a single dynamic column and loading it as CSV output

 

How do feed this newly changed schema to tmap and load into redshift table ?

 

I can always drop and recreate the table on redshift

Prakhar1
Creator III
Creator III

@Not defined Not defined​ 

If you are not doing any transformation then don't use tMap.

Directly insert it to the Redshift Output and in that output choose the "Action on table" to "Drop and Create Table".

It will drop the old table and insert the new values and with dynamic schema also.