Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
i have a scenario where I am trying to load all the columns from excel to redshift & delete the excel at the end of job
excel file gets generated everyday & sometimes new columns gets added in b/w the excel file.
For instance : excel has three columns and fourth column gets added in b/w Id & address
name Id address
moderna 6 UK
how do you I identify these new columns & either load all the columns including the new columns into redshift or Notify in an email
my job flow - tfileexcelinput -> tmap -> tdboutput
you can use "Dynamic" Schema if you are using licensed version of Talend.
Thank you prakhar.
We use a licensed version.can you guide me through the job flow or any documentation.
https://community.talend.com/s/article/Which-components-provide-the-Dynamic-Schema-feature-TBe8P#:~:text=Dynamic%20Schema%20is%20an%20advanced,the%20data%20to%20be%20handled
You can refer above link.
Let me know for further support
Is tfileexcelinput is supported by dynamic schema?.I don’t see that in the list
Hello,
The dynamic schema is a very powerful feature which allows you to have the name of the columns on the fly without prior definition.
But, if I understood your use case correctly, it cannot be used directly because the data is inserted into an existing table so it will have to be created at each execution and you will lose the already existing data. if not then you stopped reading
if you already have a table in your database and you insert a new file with new columns, then here is an approach:
- Retrieve the columns of your target table (information_schema or a similar schema). Store this in memory ==> tHashOutput
- Retrieve the columns of your Excel or CSV file. either from external metadata or by reading for example the file with a limit 1 and header 0. So, you will have a line with the name of the columns.
you can use a tExtractdynamicFields component with java code to extract column names too.
Once you have both, capitalize everything, tMap two tHashInput and you get the join reject. ==> Extra columns in your file. you recep these columns, you concatenate them with strings like "ALTER TABLE + var_global_table_name + ADD COLUMN" columnName ". then you execute that in a tDBRow and after you can get the data and insert it
Sure it does, I use it in my jobs.
Thank you John for the response.
i was able to successfully retrieve the dynamic schema from excel using tfileexcelinput & load into tfileoutputdelimited file.
i have to load these dynamic columns to redshift & I don’t have to store previous data or table schema of redshift.i can drop and recreate the table if exists
how do I map these dynamically generated tfileoutputdelimited file to tmap and tdboutput to retrieve the latest metdata.below is my workflow
tfileexcelinput (load only dynamic columns from a excel sheet) ——> tfileoutputdelimited —> on component ok —> tfileinputdelimited —> tmap ——> tdboutput
Thank you prakhar.
You are right.Currently i am reading all the current columns as a single dynamic column and loading it as CSV output
How do feed this newly changed schema to tmap and load into redshift table ?
I can always drop and recreate the table on redshift
@Not defined Not defined
If you are not doing any transformation then don't use tMap.
Directly insert it to the Redshift Output and in that output choose the "Action on table" to "Drop and Create Table".
It will drop the old table and insert the new values and with dynamic schema also.