
- Move Document
- Delete Document
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tracing data lineage for SnowSQL in Talend Data Catalog
This article shows you how to configure two Talend Data Catalog bridges to harvest metadata from Snowflake and trace data lineage. It covers the Snowflake Database (via JDBC) and Snowflake Database SQL DML (DI/ETL) Script (SnowSQL) Talend Data Catalog bridges.
Configuring the Snowflake database bridge import setup
Configure the import setup parameters for harvesting metadata from a Snowflake database in Talend Data Catalog:
-
Define the following parameters on the Import Setup tab: Host, Warehouse, Databases, and the User and Password for authentication.
-
Select the schemas to be harvested in the Schema field. This example uses the TDC_TEST, TESTSCHEMA1, and TESTSCHEMA2 schemas.
-
Go to the Overview tab to view an overview of the objects harvested from the Snowflake database.
Configuring the SnowSQL bridge import setup
This example SnowSQL script contains a create view Data Manipulation Language (DML) statement:
create view IF NOT EXISTS testschema2.v_employees ( employee_id, employee_name ) as select rvalue:employee_id::number, rvalue:employee_name::string from testschema1.employees;
You will harvest metadata from this example SnowSQL script in the steps below.
To configure the parameters for the Snowflake Database SQL DML (DI/ETL) Script (SnowSQL) bridge in Talend Data Catalog:
-
Define the Directory where the SnowSQL file is located on the Import Setup tab.
-
Add any necessary Include and Exclude filters to select the files to be harvested.
-
Go to the Overview tab to view the imported SnowSQL script.
- View the connection defined for the SnowSQL model in the Configuration Manager.
Tracing data lineage for a field/column in the SnowSQL script
-
Notice that after harvesting metadata (database objects and SnowSQL) using both bridges and building the configuration, the Architecture Diagram displays a bi-directional arrow, because the same Snowflake database model is the source and destination connection to the database.
-
Go to the Data Flow tab to view the data flow diagram for the SnowSQL script. Referring to the SnowSQL script, Talend Data Catalog creates a view on one schema (TESTSCHEMA2) with a select statement on another schema (TESTSCHEMA1).
-
You can also view the data flow diagram for the database object (schema level), which displays the three schemas.
-
Finally, you can display the data lineage for a column (EMPLOYEE_NAME).
Environment
- Talend Data Integration 8.0.1