Excel-based Data Lineage for Compose for Data Warehouse
Consumers of our warehouse are asking for an Excel-based data lineage. They would like to see full end to end linage from source (landing) to warehouse to mart tables including exposing any expressions, look-ups, and descriptions.
Justification for this includes:
Excel is the application almost all business users will leverage when documenting data lineage, mappings, and data dictionaries
In some cases the consumer is external to our company. It is not feasible to give them access to Compose or send them the HTML version of the Compose on-line documents
Users feel that Compose on-online documents are ok, but not great. Issues they face include:
Availability across the entire organization. There is no single web location all users can access.
Does not solve use cases where information must be shared with external clients
Data lineage is only at the entity level and not attribute level like is supported in the Compose application
Not all internal users will be given read-only access to the Compose application
Navigation and search is not user friendly. Search is not across the entire project. There is more hunting and pecking for information via drill-down
Similar to the above point, there is no ability to trend across all fields, mappings, expressions, etc. Much easier to do in Excel.
We have had mixed results trying to maintain it in Excel on our own. It's very manual, time consuming, and prone to errors / omissions. In order to be current and accurate, Compose needs to generate it.
Tim G wrote a query against the project metadata sqllite db which produces exactly this. there is no reason for this not to be a simple button that exports it into a CSV format, and as part of the Project Documentation. its a query. and its already done. every customer of Compose would benefit from this simple thing.
WITH Mapping as (
select map.ID as MappingID, json_extract(db.json, '$.database_name') as LandingDB, json_extract(db.json, '$.original_sourcedb_name') as SourceDB, json_extract(db.json, '$.server_name' ) as SourceServer, CASE WHEN json_extract(map.json, '$.mapping_type') = 'QUERY' THEN 'QUERY based mapping' ELSE json_extract(map.json, '$.schema') || '.' || json_extract(map.json,'$.table_name') END as LandingTable, json_extract(map.json, '$.mapping_query') as QuerySource, map.name as MappingName,json_extract(colmap.value, '$.staging_col_name_int') STAGING_COL_NAME_INT, json_extract(colmap.value, '$.mapping_type') as ColMapType, json_extract(colmap.value, '$.source_col_name') as SourceColumnName, IFNULL( json_extract(colmap.value, '$.lookup.lookup_result_expression.expression_statement'), json_extract(colmap.value, '$.expression.expression_statement') ) as Expression, json_extract(map.json, '$.entity_id') as ETY_ID, IFNULL(json_extract(dbLKU.json, '$.database_name')||'.','') || json_extract(colmap.value, '$.lookup.lookup_table_schema') || '.' || json_extract(colmap.value,'$.lookup.lookup_table_name') as LKUSourceTable, json_extract(colmap.value, '$.lookup.lookup_expression.expression_statement') as LookupCondition
FROM objects map , json_each(map.json, '$.mapping_fields') colmap JOIN objects db on db.id = json_extract(map.json,'$.database_id') LEFT JOIN objects dbLKU on dbLKU.ID = json_extract(colmap.value, '$.lookup.lookup_table_database_id') ), TABLEINFO as (SELECT ETY_ID, COL_ID, ETY_NAME as LogicalEntity, CASE WHEN json_extract(COLUMNS.json, '$.history_type') = 'S' THEN tblPrefix || ETY_NAME || sat_suffix || json_extract(COLUMNS.json, '$.block') ELSE tblPrefix || ETY_NAME || hub_suffix END as PhysicalVaultTable, CASE WHEN DA.type = 'Entity' then 'Relationship' else 'Column' end as AttType, CASE WHEN IFNULL(json_extract(COLUMNS.json, '$.prefix'),'') || ifnull(COL_NAME, json_extract(DA.json, '$.name') ) ='' THEN IFNULL(json_extract(COLUMNS.json, '$.prefix'),'') || COL_NAME ELSE IFNULL(json_extract(COLUMNS.json, '$.prefix'),'') || ifnull(COL_NAME, json_extract(DA.json, '$.name') ) END as ColumnName ,json_extract(COLUMNS.json,'$.ordinal') /100 as ColumnOrdinal , json_extract(DA.json,'$.type') as DataType, 'COLID_'|| COL_ID || '_' || CASE WHEN DA.Type = 'Entity' THEN ETY_PK.id ||'_' || json_extract(ETY_PK.json, '$.pbe') ELSE COL_DOMAIN_ID END as RelatedETYID
FROM (SELECT id as ETY_ID, name as ETY_NAME FROM objects ETY WHERE ETY.type ='Entity' ) ENTITY JOIN ( select json, json_extract(COL.json, '$.poe') COL_PARENT, json_extract(COL.json, '$.pbe') COL_DOMAIN_ID, id as COL_ID, json_extract(COL.json, '$.is_primary') as IS_PK, json_extract(COL.json, '$.ordinal') as COL_ORDINAL , json_extract(COL.json, '$.attribute_name') as COL_NAME
FROM objects COL WHERE COL.type ='TableColumn' ) COLUMNS on COLUMNS.COL_PARENT = ETY_ID LEFT JOIN objects DA on DA.id = COL_DOMAIN_ID LEFT JOIN objects ETY_PK on DA.type = 'Entity' AND json_extract(ETY_PK.json, '$.poe') = DA.ID AND json_extract(ETY_PK.json, '$.is_primary') = 1 JOIN (select json_extract(json, '$.naming_conv.dwh_prefix') as tblPrefix, json_extract(json, '$.naming_conv.hub_suffix') as hub_suffix, json_extract(json, '$.naming_conv.sat_suffix') as sat_suffix from objects where name = 'settings') on 1=1 ) , FactDims as ( select fd.json, json_extract(fd.json, '$.type') as DataMartObject , CASE json_extract(fd.json, '$.history_type') WHEN 'N' THEN 'Type 1' WHEN 'S' THEN 'Type 2' END as DimType , json_extract(fd.json, '$.root_entity_id') as Root_ETY_ID , json_extract(fd.json, '$.display_name') as DMObjectDisplayName , replace( json_extract(fd.json, '$.name'), 'DataMart!!','') as DMPhysicalTableName , json_extract(colmap.value, '$.attribute_name') as DMColumnName , json_extract(colmap.value, '$.data_type') as DMColumnDataType , CASE json_extract(colmap.value, '$.aggregate_function') WHEN 'GROUPBY' THEN json_extract(colmap.value, '$.expression.expression_statement') ELSE json_extract(colmap.value, '$.aggregate_function') ||'( ' || json_extract(colmap.value, '$.expression.expression_statement') || ' )' END as DMSourceExpression , reverse(substr( reverse( json_extract(epr.value, '$.alias_parameter')) , 0, instr(reverse(json_extract(epr.value, '$.alias_parameter')), ' '))) as DV_COL_ID, dm.* , dm.name as DataMartName
from objects fd, json_each(fd.json, '$.fact_dim_columns') colmap, json_each(colmap.value, '$.expression.alias') epr JOIN objects dm on dm.id=json_extract(fd.json,'$.datamart_id') WHERE fd.type = 'FactDim'
)
SELECT T.LogicalEntity , IFNULL( M.MappingName, '** NOT MAPPED **') as MappingName , T.PhysicalVaultTable, T.ColumnName, T.AttType, T.ColumnOrdinal, M.LandingDB, M.LandingTable, M.ColMapType, IFNULL(M.Expression, M.SourceColumnName) as SourceExpression , M.LKUSourceTable, M.LookupCondition, DataMartName, FD.DataMartObject as DataMartObjectType, FD.DimType, FD.DMObjectDisplayName, FD.DMPhysicalTableName , FD.DMColumnName, FD.DMSourceExpression as DM_Expression, M.MappingID,
replace(REPLACE(QuerySource, CHAR(13),' '), CHAR(10), ' ') as QuerySource
FROM TABLEINFO T LEFT JOIN FactDims FD on FD.DV_COL_ID = T.COL_ID LEFT JOIN Mapping M on M.ETY_ID = T.ETY_ID AND M.STAGING_COL_NAME_INT = T.RelatedETYID
--and DMColumnName = 'TestAtt' ORDER BY LogicalEntity, MappingName, PhysicalVaultTable, ColumnOrdinal, DataMartName, DataMartObjectType, DMObjectDisplayName, DMColumnName ;
NOTE: Upon clicking this link 2 tabs may open - please feel free to close the one with a login page. If you only see 1 tab with the login page, please try clicking this link first: Authenticate me! then try the link above again. Ensure pop-up blocker is off.