Dynamically selecting schema in DQ Column Analysis
We have several database schemas that contain tables with the exact same columns, data types, etc. in each schema. Let's say I create a Column Analysis for the CUSTOMER table in SchemaA. I want to create a Talend job that runs the same column analyses on SchemaB, SchemaC, and SchemaD. How do I do this with context variables? When I run the analysis, the SQL looks something like this: select count(*) from SchemaA.Customer where ColumnA is null It looks like it's hardcoding the schema name in the generated SQL. How do we work around that to make it dynamic?
You could put the schema into a context variable and call the same job with handover the schema as context variable.
Let say you define in your job a context variable called database_schema.
Your query should looks like:
At the beginning of the job or for the context parameters in a tRunJob you set the context variable with the correct schema.
select count(*) from " + context.database_schema + ".Customer where ColumnA is null
Actually this way you could also change column names.
Makes sense, but will this approach work when we create a job that uses tDQReportRun component to call a DQ Column Analysis? When we create a Column Analysis, we must select a DB to choose the columns to analyze. If we select SchemaA as our database, will this same job work if we have a context variable for schema and we run it in other schemas besides SchemaA?
I do not know, I have not used the component tDQReportRun so far. But the typical configuration of a database is via context variables and I would be surprised if the data quality stuff use other methods to configure the database.