Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Ray0801
Creator
Creator

Do I need to have a hadoop cluster installed to work with Compose for datalakes ?

If yes,can u explain why is it necessary .

Labels (1)
1 Reply
TimGarrod
Employee
Employee

@Ray0801 ,   Compose for Data Lakes (C4DL) is built to support a cloud or traditional data lake.   The architecture requires processing capabilities which can be hadoop ecosystems (AWS EMR, Azure HDInsights, Google Dataproc, HDP, CDH )  or databricks along with the associated data lake storage (S3, ADLS, Google Cloud Storage, HDFS).

C4DL provides a couple of choices with regards to processing data in the lake - leveraging either Hive or Spark (on hadoop ecosystems) or SparkSQL and Databricks Delta tables for databricks implementations. 

C4DL itself runs on a Windows server and either interacts remotely (Hive / Databricks projects) or via a lightweight orchestration agent that executes Spark jobs (Spark projects).    Leveraging these environments allows processing of data at scale and creation of data sets in multiple file formats on the data lake storage layer.  

 

Hope this helps.