Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi ! I am new to using Talend tool. Please let me know how schema drifts can be handled in Talend.
We are migrating 50 tables from Oracle to cloud based data warehouse solutions. Schema in source table could change or there could be an addition of new column(s) at the source table. Is there any way to handle data/schema drifts in Talend? If yes, what components should I look into?
Thanks for the reply ! I tried dynamic schema option. The job ran for no schema changes. When i added a new column to source table, the same didn't get reflected on my target table. Please advise.
If your source table changes, your target table needs to change to accommodate that. If you are using the dynamic schema functionality it will pick up your new columns (if your query from your source table accommodates that), but won't automatically update your target tables.
There should be a change process in play where you are working so that changes to schemas are known about. You may want to investigate Data Inventory (https://www.talend.com/products/data-inventory/)
The target snowflake table was able to handle the change. Would i have to drop and recreate my target table every time? Is it possible to trigger just an alter table command to hand source table column additions without dropping the table?
I want my target table data to remain intact and I would want only the new columns to be added to dropped without dropping and recreating the table.
There is no component that will automatically do this for you. It's a bit niche for that. However this can be achieved with some code. I have written a blog about using the dynamic schema with a bit of Java (https://www.talend.com/blog/2019/11/11/migrate-data-between-databases-one-job-using-dynamic-schema/). This isn't to serve your requirement, but it shows you how you can get information about your source table from the dynamic schema. Once you have information on a change in structure, you can write your Snowflake DDL script to amend your target tables, then run your jobs.
All of this can be built into a Talend job. It is one of the benefits of using a tool where you can enhance the functionality simply by using Java.
Hello @vvazza10 ,
To realize the delta of the data structures, it is not very complicated but will have to code a little.
First, before each data loading (with the dynamic schema), you will compare the source and the target in order to update them.
The idea is to retrieve the names of the tables and columns and do an update.
the different steps:
- Retrieve the source tables as well as the columns (you can use tDBList and tDBColumnList in order to iterate and retrieve all the parameters of the columns).
You can also execute a query to retrieve this information from schema_information and filter on the table type. https://dataedo.com/kb/query/sql-server/list-table-columns-in-database
-You do the same with the target (if it's snowflake, you can execute a query on the Snowflake base, it contains all the metadata) https://dataedo.com/kb/query/snowflake/list-table-columns-in-database
All type of DB https://dataedo.com/kb/query
-you're going to store this in tHashOutput.
-You perform an inner join on the name of the tables and the name of the columns (you will have the columns that exist in the source and the target)
-You get the join reject (the columns being in the source but not in the target)
-you use a tAggregateRow in the output (group by table name and the list function on the columns ==> table_name; col1, col2, col3
-you can use tMap to add characters to output ==> query = "ALTER TABLE" + input.table_name + "ADD COLUMN (" + input.columns + ");" .
-You finally use a tFlowToIterate -> tJava (blank) -> tDBRow (Snowflake) in order to inject your DDL query .
-Then, with the dynamic schema you can inject the data
If you have not understood a part do not hesitate I can pass you similar jobs to get an idea.
In this picture, i store the table and column name in db so it's the same way
I get the name of table-->get column proprities--> check existence with join-->create the query format--> update the target
PS: Dont forget the mapping and type transformation ! string=>varchar or somthing like that, you can do it before the aggregation and use tMap var
you can use the job attached