Discussion Board for collaboration around Catalog and Lineage.
Hello together,
in the following I will ask a couple of questions to achieve a better understanding of the possibilities with Data Catalyst in combination with the traditional qlik range (view/sense).
My interpretation of the functionality of Data Catalyst is that this tool provides customers a better organisation and understanding of their data while providing analytics tools for the data. I can combine different data sources and combine them with keys.
Traditionally the qlik applications (view/sense) do not need an extra data warehouse to organize the data sources. While going through the ETL process the qwf files are generated by script to use them within the applications. I can use a qvd file in more than one application at the same time.
My first question is about the correctness of my assumptions above. Am i right so far?
Focusing on Data Catalyst now, I would like to know...
1) if Data Catalyst can be seen as a "qlik native data warehouse", where i can organize all my input sources?
2) if I can extract and transform the input sources like i can do it with qlik view scripts?
3) if i can connect the Data Catalyst automatically with third party systems like e.q. SAP BW via connectors?
My questions lead into the direction that i am thinking about using a well organized "data warehouse" with user interface possibilities instead of all the individuell scripts and files. My goal would be a central system, where all input files are transformed and updated, which provides the qvd files for all the different applications (sense AND view).
Is that a realistic option or do i misunderstood the meaning of Data Catalyst?
Thanks in advance!
Jonas
I will do my best to answer your questions.
First, Qlik Data Catalyst should not be thought of as a data warehouse or an OLAP tool of any kind. The goal of QDC is to provide a way to access data of all types (raw, conformed, and anywhere in between). It provides a data marketplace UI to shop for data so that it can be brought to the OLAP tool of your choice. In the case of QlikSense, a person might go to this marketplace rather than using a connector to establish a connection to a new source. The work that QDC is doing is converting data in many forms (mainframe, JSON, XML, delimited) into one standard format - many people select Parquet. These parquet files are profiled, organized, tagged, and secured. The result in QDC is a well organized catalog that can organize data in a data lake or data that is federated inside and outside the company.
In many cases, this data can be made available to an ETL tool to create a data warehouse, to QlikSense directly for self-service BI, or to a data science/predictive modeling tool set.
The main goal is secure data availability so that as analytics tools change (SQL becomes R becomes Tensorflow), you can always have a good pool of data at the ready to enable these processes.
I will do my best to answer your questions.
First, Qlik Data Catalyst should not be thought of as a data warehouse or an OLAP tool of any kind. The goal of QDC is to provide a way to access data of all types (raw, conformed, and anywhere in between). It provides a data marketplace UI to shop for data so that it can be brought to the OLAP tool of your choice. In the case of QlikSense, a person might go to this marketplace rather than using a connector to establish a connection to a new source. The work that QDC is doing is converting data in many forms (mainframe, JSON, XML, delimited) into one standard format - many people select Parquet. These parquet files are profiled, organized, tagged, and secured. The result in QDC is a well organized catalog that can organize data in a data lake or data that is federated inside and outside the company.
In many cases, this data can be made available to an ETL tool to create a data warehouse, to QlikSense directly for self-service BI, or to a data science/predictive modeling tool set.
The main goal is secure data availability so that as analytics tools change (SQL becomes R becomes Tensorflow), you can always have a good pool of data at the ready to enable these processes.
Hej Joe_DosSantos,
thanks for your detailed reply. Good answers to my questions 🙂