Qlik Community is the global online community for Qlik Inc. employees, experts, customers, partners, developers and evangelists to collaborate.
Attached you will find a new step-by-step guide by on how to evolve your existing data lake with Qlik Compose for Data Lakes.
Attached you will find a new step-by-step guide by on how to evolve your existing data lake with Qlik Compose for Data Lakes.
As data lakes mature, the need for low-latency data is growing. Organizations need analytics-ready data in near real-time to feed their AI, machine learning models, perform streaming analytics and and support other data science initiatives. Capturin
...As data lakes mature, the need for low-latency data is growing. Organizations need analytics-ready data in near real-time to feed their AI, machine learning models, perform streaming analytics and and support other data science initiatives. Capturing data changes in real-time can be complex with modern cloud data lake platforms. In this paper, we describe how utilizing Qlik’s Data Integration platform with Databricks can help organizations achieve low-latency data requirements with their data lake.
Check out the capabilities in the most recent release of Compose for Data Lakes. https://help.qlik.com/en-US/compose/Content/Compose/DataLakes/6.5/PDF/Release_Notes.pdf
Check out the capabilities in the most recent release of Compose for Data Lakes. https://help.qlik.com/en-US/compose/Content/Compose/DataLakes/6.5/PDF/Release_Notes.pdf
1. Introduction:
This article gives you an overview of the debugging steps for missing data in Qlik Compose for Data Lake (QC4DL). The QC4DL works with Qlik Replicate, and the data in __ct tables in Landing Area of the Replicate are assumed to be c
...1. Introduction:
This article gives you an overview of the debugging steps for missing data in Qlik Compose for Data Lake (QC4DL). The QC4DL works with Qlik Replicate, and the data in __ct tables in Landing Area of the Replicate are assumed to be correct for this troubleshooting steps. The green box below stands for the QC4DL
Enabling logs to verbose mode to different modules of QC4DL can help you find the root cause of the missing data.
2. Setting up logs to verbose mode for different modules in QC4DL:
2.1 Setting up server logs to verbose mode:
Note: Changes to the logging level take place at once. There is no need to restart the Qlik Compose for Data Lakes service.
2.2 Setting up Agent log to verbose mode:
Note: Changes to the logging level take place at once. There is no need to restart the Qlik Compose for Data Lakes service.
2.3 Setting up Storage task logs to verbose mode:
Note: Changes to the logging level take place at once. There is no need to restart the Qlik Compose for Data Lakes service
3.0 Setting up Storage task logs to verbose mode:
Qlik Compose for Data Warehouse logs are in different locations:
<product_dir>\data\log\Compose.log
<product_dir>\data\projects\[project_name]\logs\task_name\[run_Id].log
On Windows:
<product_dir>\java\data\logs\agent.log
On Linux:
/opt/attunity/compose-agent/data/log/agent.log.
Note: The agent logs can also be found on the remote machine where the agent is installed. The default location for the agent that’s installed on Linux is under “/opt/attunity/compose-agent/data/logs/agent.log”.
On Windows:
<product_dir>/data/projects/project_name/logs/etl_tasks/Compactor/logs
On Linux:
“/opt/attunity/compose-agent/data/projects/<project name>/logs”
Note:
Spark history server logs (SPARK ONLY):
To access spark logs; from Storage zone pane, select Manage Data Storage, and then click on Spark History Server. This will redirect you to the corresponding “spark history server”, and in here click on the stdout/stderr for each task to see any error or any other information
Reviewing the HDFS Log, Yarn Log, Hive Server Log etc. outside of QC4DL will come in handy when debugging issues with QC4DL.
4.0 Checking missing data in different stages in QC4DL(SPARK):
The C4DL SPARK project has 3 stages – Landing Zone, Storage Zone and Provision Zone. The missing data can occur in any of the 3 stages.
Below describes how to confirm data in different stages:
4.1 Checking data in Landing zone:
The table name is landing zone are prefixed __ct table and the data in this table assumed to be correct.
In the example below, in the __ct table there are 2 rows for a single update, one is from BeforeImage and the other is from AfterImage.
There are different ways to connect to landing zone database. “Hive” client in terminal, Hive query tool in Ambari, and 3rd party JDBC connection tool, SQL Workbench are some of the most used tools.
The connection information for landing database can be found from our UI under the Landing and storage database connection - see below:
4.2 Checking data in storage zone:
The storage zone tables are prefixed with __delta. __delta table will have 1 row with afterimage, the header information and delete the flag information.
Run the Storage tasks, and note down the “status”
Done – data updated
Skipped – no data updated
The connection information for storage database can be found from our UI under the Landing and storage database connection - see below
There are different ways to connect to landing zone database. “Hive” client in terminal, Hive query tool in Ambari, and 3rd party JDBC connection tool, SQL Workbench are some of the most used tools. Below is an example using “Hive” client.
4.3 Checking data in Provision zone:
We recommend using Spark-shell to check the data in provision zone. If you want to use “Hive” client in terminal or Hive query tool in Ambari, or any other 3rd party JDBC connection tool, then you’ll need to use enable “Create Hive External Tables” in the task setting. By default, this option is not enabled.
Note: Spark-shell can be used to query all data formats including Parquet, ORC, and AVRO.