Qlik Community

Ask a Question

Qlik Compose for Data Lakes

Discussion board for collaboration on Qlik Compose for Data Lakes.

Talk to Experts Tuesday: Live chat Aug. 24th 10 AM ET: Bring your Qlik Gold Client questions REGISTER TODAY
Showing results for 
Search instead for 
Did you mean: 
Creator III
Creator III

What is the software difference between Compose for Data Warehouse vs for Data Lakes?


What is the software difference between Compose for Data Warehouse vs for Data Lakes? We have 6.5 version.

From design point of view DW is for structured data, while data lakes are for structured and unstructured data but I don't understand what the difference between the software options for Attunity.

Labels (1)
4 Replies

A Data Lake is a kind of storage repository that consists of only raw data that are in the form of structured, semi-structured and unstructured format. The data lake is mostly used by Data Scientists and Machine Learning Engineersas it helps them to answer questions that are not yet answered or perhaps create a question that is not yet known. It contains a vast pool of data with different types and when they are integrated.


To add to Trafoosss answer -

For Compose for Data Lakes - consider the architecture of a data lake.  It is often designed with multiple "zones" for example the medallion method where you have Bronze [data that looks just like the source] --> Silver [some transformation / dq rules applied etc] --> Gold [fully curated, transformed datasets].   

Compose for Data Lakes operates very much in the bronze layer of the lake.   Understanding how Qlik Replicate delivers change data to the lake and automating the process of transactional data management in the lake.   It generates spark/hive/sparksql (depending on compute environment) to "stitch" the transactions while also enabling certain schema evolution features.    Further transformation / curation /dq of the data would be performed downstream from the datasets that Compose for Data Lakes generates.   Compose for Data Lakes is built to operate in a "traditional" lake environment.  Think S3, ADLSGen2, Google Cloud Storage, HDFS with compute layers of EMR/ databricks / HDInsight / DataProc.


Compose for Data Warehouse provides end-to-end data warehouse life-cycle management against a relational data warehouse platform (think Snowflake, Redshift, Azure Synapse, Oracle, SQL Server).  It provides features to help manage and automated the entire dw lifecycle - Modeling, ELT mappings with automated code generation, Data marts, documentation etc.   Compose for DW provides more transformation / quality / data validation features that Compose for Data Lakes -because of the arena in which it operates. 


Having said that Compose for Data Warehouse can be used to build "relational" lakes also (Snowflake for example is becoming a very popular platform to support both lake and warehouse workloads). 


Hope this helps!


Thanks for the information. keep sharing topic like this.


Hi Mwallman,

Compose for Data Lakes is also for structured data files that are ingested on Cloud Storage. Such as S3,ADLS gen2, GCS or HDFS. Compose for Data Lake creates a full standardized history of the data utilizing a Compute engine such as Spark or Hive. The standardized history data is snappy compressed and in parquet format within a data lake storage zone. Depending on the compute engine vendor. Example (EMR,HDInsight,Databricks,etc.) Provision data sets can be created in the data lake storage bucket and target folder of choice. Provisioned data sets are Historical, Operational data sets, Snapshots that are generated off the standardized  data in the storage zone. ( Depending on Compute vendor some provisioning types may not be available.)

Compose for Datawarehouse is for datawarehouse targets. Example ( Synapse, Redshift, Snowflake, etc.) You can utilize data replicated to a datawarehouse target utilizing Qlik Replicate. To create and automate creation of a datawarehouse and data mart on that datawarehouse target. Data captured on the target from the Qlik Replicate CDC process will than be loaded into datawarehouse schema and data mart through the Compose for Datawarehouse workflow. ( Data in schemas on the target outside of the Replicate process can be loaded into the Datawarehouse and data mart schema as well.)