@Ray0801 , Compose for Data Lakes (C4DL) is built to support a cloud or traditional data lake. The architecture requires processing capabilities which can be hadoop ecosystems (AWS EMR, Azure HDInsights, Google Dataproc, HDP, CDH ) or databricks along with the associated data lake storage (S3, ADLS, Google Cloud Storage, HDFS).
C4DL provides a couple of choices with regards to processing data in the lake - leveraging either Hive or Spark (on hadoop ecosystems) or SparkSQL and Databricks Delta tables for databricks implementations.
C4DL itself runs on a Windows server and either interacts remotely (Hive / Databricks projects) or via a lightweight orchestration agent that executes Spark jobs (Spark projects). Leveraging these environments allows processing of data at scale and creation of data sets in multiple file formats on the data lake storage layer.