Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Enriching structural metadata in QV/extract data lineage from QV

Hi

I am new to QlikView and its architecture. As a senior developer I got a task to solve data lineage issue in our company.

I know that QV Governance Dashboard offers a way to scan and present data lineage. My business department want to see not only what sources (in this case data warehouse is always QV source) and columns different QV apps are using, but also sources populating the data warehouse itself (databases, Excel sheets, files). Data warehouse sources are stored in a repository which I can read with ease. My first question is if I can enrich structural metadata created by the Governance Dashboard in order to present total data lineage, ie. Database/Excel/Files->Data Warehouse->QV?

Another question is the opposite of the above. Is there an API or a way to extract data lineage (sources and fields) from Governance Dashboard or .QVW? Does Governance Dashboard produce some kind of output as MetaScanner did (is MetaScanner deprecated?)?

Kind regards

A.H.

9 Replies
Tyler_Waterfall
Employee
Employee

A.H.

I am not aware of any way to see the full lineage picture at this point.

Concerning the data in the Governance Dashboard, the reload does produce several QVXs and QVDs that could be used in other applications.  These files are stored in the profile folder you set in the configuration page of the Governance Dashboard.

Some files to consider:

TableLineage.QVD

QVWTables.QVD

QVWFields.QVD

QvdTableHeader.QVD

QvdFieldHeader.QVD

You might just do a CTRL+T in the Governance Dashboard to see how these tables are associated to get a sense of how you could use them in other applications.

Not applicable
Author

So pls help me understand one thing...

I just examined the files you recommended and I can see that QVD-files are used as input to the QVW. QVD on the other hand are generated by a LOAD with a SQL-statements. In order to find what database tables and database table fields are used by QVD I need to parse LoadStatement-column?

Kind regards

A.H.

Not applicable
Author

Hi Tyler,

Can you please share some knowledge regarding my last question (data lineage from a database)?

Kind regards

A.H.

Tyler_Waterfall
Employee
Employee

Sorry for the delay.

And, I think I was not quite understanding your initial request.

To be sure I understand, you have an "ETL" layer that takes data out of the disparate sources (XL, csv, database) to a Data warehouse. Then you use QlikView to extract data from the data warehouse to QVDs.

For QVDs generated in this process, the Governance Dashboard shows the qvw that created them.

The GovDB also shows which sources were used during its process to create internal tables (which, in your case were stored as QVDs).  For database sources, you can use the "Source" field, either at the top of the Lineage page or in the "Load / Select Connections..." table at the bottom. (screen shot below).

Note, however, that the lineage information stored in a QVW is based on the tables that are stored with the QVW.  So, if you load 5 tables in the QVW (and you look at the data model, you see 5 tables), you would see all sources that were used to generate those five tables.

If, however, you drop all 5 tables and save the QVW (using a drop table statement), you will no longer see that lineage information.  This makes lineage problematic when you have several QVWs that just generate one QVD and then drop the table before saving (the QVW).

Consider this comment on this post: QVW LineageInfo missing

Not applicable
Author

Hi Tyler,

Sorry for harassing you but by reading your comments on other posts I would say you seem to be a go-to guy when it comes at Gov Dashboard.

Firstly, thx for the answer, the tip about deleted tables is highly appreciated.

However I agree that sometimes Sources may show database table name clearly (like in your case), other times not.

Consider this Load-statement

D_Acknowledgement_Result:

LOAD

  "ACKNOWLEDGRESID" AS AckResID,

    IF("CURRENTID"=1,'Yes','No') AS AckResultCurr,

    "CURRENTID" AS AckResultCurrIndicator,

    "VALIDFROM" AS AckResultValidFrom,

    "VALIDUNTIL" AS AckResultValidUntil,

    IF("DELETED"='Y','Yes','No') AS AckResultDel,

    "DELETED" AS AckResultDelIndicator,

    "RUNID" AS AckResultRunID,

    "ACKNOWLEDGEMENTRESULTCD" AS AckResultCode,

    DESCRIPTION AS AckResult,

    CATEGORY AS AckResultCategory;SQL SELECT *

FROM DW."ACKNOWLEDGEMENT"

Sources-column show following value:

" AS AckResultValidFrom,

    "VALIDUNTIL" AS AckResultValidUntil,

    IF("DELETED"='Y','Yes','No') AS AckResultDel,

    "DELETED" AS AckResultDelIndicator,

    "RUNID" AS AckResultRunID,

    "ACKNOWLEDGEMENTRESULTCD" AS AckResultCode,

    DESCRIPTION AS AckResult,

    CATEGORY AS AckResultCategory;SQL SELECT *

FROM DW."ACKNOWLEDGEMENT"

Does this mean Sources-column cannot give ok values if LOAD-statement is a bit big?

Can you also confirm that there is no way of finding out database table column names unless we parse the value of Select/Load-column (LOADStatement in the QVD/QVX files created by Gov Dashboard)?

Kind regards

A.H.

Tyler_Waterfall
Employee
Employee

Yes, I think parsing the load statement is your best (yet far from ideal) option - with using the Governance Dashboard.

Concerning the sources, the "scan" logic looks for the <lineageInfo> information in the xml footer of the QVW file.  If you open a qvw in a text editor, at the bottom you will find this xml.  Just search for "lineage".

The "scan" logic looks in the <Statement> entity for "from" (case insensitive). And - aha - I see that your load script includes "from" in the the line starting with "VALIDFROM".  I would say the scan is "working as designed" but not as intended and has this limitation -- If your load script includes the text "from" anywhere outside of the final "FROM" statement, the "Source" field will include extra stuff.

Before I noticed this, I started to share an example.  Here it is, for the benefit of the community (and for me - when I get this question again.)

In one example, the QVW load script looks like this:

OLEDB CONNECT TO [Provider=Microsoft.ACE.OLEDB.12.0;User ID=Admin;Data Source=C:\Users\twa\playground\Data\sales_fiction.accdb;Mode=Share Deny None;Extended Properties="";Jet OLEDB:System database="";Jet OLEDB:Registry Path="";Jet OLEDB:Database Password="";Jet OLEDB:Engine Type=6;Jet OLEDB:Database Locking Mode=1;Jet OLEDB:Global Partial Bulk Ops=2;Jet OLEDB:Global Bulk Transactions=1;Jet OLEDB:New Database Password="";Jet OLEDB:Create System Database=False;Jet OLEDB:Encrypt Database=False;Jet OLEDB:Don't Copy Locale on Compact=False;Jet OLEDB:Compact Without Replica Repair=False;Jet OLEDB:SFP=False;Jet OLEDB:Support Complex Data=False;Jet OLEDB:Bypass UserInfo Validation=False];

sales_header:

SQL SELECT year(dt) as Yr,

    month(dt) as Mnth,

    ID,

    product,

    `product code`,

    sales as sales_header

FROM Table1;

directLoad:

DIRECT

SELECT //ID,

//    product,

    `product code`

   

    IMPLICIT

    sales,

    units

FROM Sheet1;

The xml in the qvw looks like this:

<LineageInfo>

  <Discriminator>INLINE</Discriminator>

  <Statement></Statement>

</LineageInfo>

<LineageInfo>

  <Discriminator>Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\twa\playground\Data\sales_fiction.accdb;Mode=Share Deny None;Extended Properties=&quot;&quot;;Jet OLEDB:System database=&quot;&quot;;Jet OLEDB:Registry Path=&quot;&quot;;Jet OLEDB:Database Jet OLEDB:Engine Type=6;Jet OLEDB:Database Locking Mode=1;Jet OLEDB:Global Partial Bulk Ops=2;Jet OLEDB:Global Bulk Transactions=1;Jet OLEDB:New Database Jet OLEDB:Create System Database=False;Jet OLEDB:Encrypt Database=False;Jet OLEDB:Don't Copy Locale on Compact=False;Jet OLEDB:Compact Without Replica Repair=False;Jet OLEDB:SFP=False;Jet OLEDB:Support Complex Data=False;Jet OLEDB:Bypass UserInfo Validation=False</Discriminator>

  <Statement>directLoad:

DIRECT

SELECT

    `product code`

  

    IMPLICIT

    sales,

    units

FROM Sheet1</Statement>

</LineageInfo>

<LineageInfo>

  <Discriminator>Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\twa\playground\Data\sales_fiction.accdb;Mode=Share Deny None;Extended Properties=&quot;&quot;;Jet OLEDB:System database=&quot;&quot;;Jet OLEDB:Registry Path=&quot;&quot;;Jet OLEDB:Database Jet OLEDB:Engine Type=6;Jet OLEDB:Database Locking Mode=1;Jet OLEDB:Global Partial Bulk Ops=2;Jet OLEDB:Global Bulk Transactions=1;Jet OLEDB:New Database Jet OLEDB:Create System Database=False;Jet OLEDB:Encrypt Database=False;Jet OLEDB:Don't Copy Locale on Compact=False;Jet OLEDB:Compact Without Replica Repair=False;Jet OLEDB:SFP=False;Jet OLEDB:Support Complex Data=False;Jet OLEDB:Bypass UserInfo Validation=False</Discriminator>

  <Statement>sales_header:

SQL SELECT year(dt) as Yr,

    month(dt) as Mnth,

    ID,

    product,

    `product code`,

    sales as sales_header

FROM Table1</Statement>

</LineageInfo>

And the Source and Select/Load statement look like this in Governance Dashboard:

Source:

INLINE

Sheet1

Table1

Select/Load:

directLoad:

DIRECT

SELECT

    `product code`

   

    IMPLICIT

    sales,

    units

FROM Sheet1

sales_header:

SQL SELECT year(dt) as Yr,

    month(dt) as Mnth,

    ID,

    product,

    `product code`,

    sales as sales_header

FROM Table1

Not applicable
Author

Hi Tyler,

Thx for the answers and useful input. Sry for late delay, I was away for couple of days studying Gov DB output.

What I can conclude is that Sources-column is not much useful when source is database as our SELECT-statements in 90% of cases use WHERE clause. It seems that Gov DB cannot extract table name properly when there is text after FROM + single table name (Sources will then contain table name + anything after that). Therefore same logic applies for table joins, subqueries, etc. So our data lineage stops when we see that a QV app is using database as a source (around 90% of the time) as we cannot get table name(s) or table columns. Parsing load statements is just too much work for us, we could probably never create a good solution anyway.

It would be very handy having data lineage for database objects in the next version of Gov DB as I believe databases are too often used as sources, especially in big companies that sorely need a better track of their data flow. I don't know where this kind of request would be appropriate to make thou

Kind regards

A.H.

Tyler_Waterfall
Employee
Employee

It is true that the current implementation of lineage in QlikView Governance Dashboard is basic and will indeed include more information in the "Sources" field if that exists (like WHERE statements and comments).

I will add these notes to this improvement request.

v_petrenko
Partner - Contributor III
Partner - Contributor III

Though more than a year has passed after topic started, I think my reply may be useful.

I recently published a special QV app specifically for the purpose of Lineage analysis: Lineage / Dependency Analyzer for QlikView