Talk to Experts Tuesday - Qlik ComposeFAQ

Jamie_Gregory · Feb 26, 2021 11:33:24 AM

This is the FAQ for the February 23rd Talk to Experts Tuesday session on Qlik Compose.

For the recording and transcript, please see

Environment: Qlik Compose for Data Warehouses Qlik Compose for Data Lakes

How do I integrate Qlik Compose for Data Warehouses v7.x with Git repo in Azure DevOps with Azure credentials and authentication?

Currently we do not have any option in Compose. We have only if they want to integrate with Github and they can use normal credentials like username and password. If they want to integrate with Azure security authentication, as of today, we don’t have any option.

Please create an Ideation on this so Product Management can prioritize the request.

What is the approved version numbering system for Qlik Compose for data warehouses? It's confusing with 'February 2021' and November 2020 release in set up guide documentation, but then release numbers for service packs, so it's unclear if the set up guide help documentation is related to a service pack.

When we were Attunity, we used to have numbers like 5.5, 6.X or 7.X. Starting from November, we are following the Qlik version release, which are based on the month and year. There is also a build number for each service pack. That is used to differentiate for service packs. Here is a cheat sheet for the build vs versions.

With the February 2021 release, Qlik Compose for Data Lakes and Qlik Compose for Data Warehouses have been condensed to one product. For more information, please see this post.

What is the process for Qlik reviewing and including Ideas from the Ideas forum? How frequently are ideas included in product roadmap?

Product Managers review the ideas every two weeks. When you submit an idea, the Product Manager will communicate with you throughout the lifecycle of the idea. For more information, please see this post which walks you through the statuses and lifecycle.

There are also Roadmap sessions hosted by Qlik Community. Under Events, look for Qlik Product Portfolio Strategy and Roadmap sessions.

In the latest version of Qlik compose, do you have option to set up environment variables for SQL queries? This helps us to parametrize the database/schema name before migrating to production environment.

There is nothing special related to this feature but if you want to change database names and other stuff, you can modify it in the JSON file then deploy it in the production environment. I have observed a couple of customers following this practice when they migrate from dev to production.

Please create an Ideation on this so Product Management can prioritize the request.

What is the difference between Compose for Data lakes and Compose for Data Warehouse?

Data Lakes is mostly for big data databases like Hadoop, Databricks, and Spark. We try to match the data from all RBMS systems in one place in the data lake. We will be removing support for Spark so we recommend customers to go with Hive completely so you don’t need to use any agent setup for Data Lakes.

With Compose Data Warehouses, mostly will handle some ETL stuff on the RDBMS database. If you want create some transformations on the RDBMS data from SQL to SQL, you want to use this data warehouse product. We improved a lot in this product, like more databases. Initially it was only SQL or Oracle. Now it has Big Query, Redshift, Snowflake, Azure.

From a design perspective, Data Lakes is structured and unstructured and Data Warehouses is structured. Data Lakes works based out of the data storage. Data Warehouses is more like the tradition enterprise data warehouse where we use the Oracle SQL Server or any of those relational database types. When it comes to Compose for Data lakes, we store the data files are in the Cloud storage like S3 or Gen2 or HDFS.

For more information, please see this post.

Does Qlik Replicate 7.0 work with Qlik Compose 6.6?

If the question is related to Compose for Data Lakes, yes, 6.6 will work with Qlik Replicate 7. For Data Warehouse, it is recommended to use the 7.0 version with Replicate 7.0.

You can find compatibility between products in the Release Notes.

Are there any best practice or tips for using Compose?

In terms of best performance, the recommendation or best practice when it comes to using Snowflake as your data source or data warehouse, we strongly recommend installing our software on a machine that is located in the same region as your database instance.

When it comes to Microsoft Azure SQL database or Synapse, you must put your data warehouse on the same database as the landing area. But on a different schema obviously, so both have to be on the same database and a different schema.

When it comes to Oracle and you are dealing with millions of records, make sure you have enough undo temp and redo table spaces.

What are some of the major features coming out for Compose Data Lakes and Data Warehouses?

For Compose for Data Warehouse:

ETL Code Generation with all validations or for basic validations (which drastically reduces the time required to generate the ETL)
Truncating the table to preserver any custom table properties
Improved performance for Azure Synapse Analytics
Audit trail that reports on who performed the operation, when it was performed, which object it was performed on, etc

For Compose for Data Lakes:

Ability to use live views which reduces latency
Ability to clear the metadata cache

Of course, now with the February release you also have the ability to choose either Compose for Data Warehouses or Compose with Data Lakes.

For more detailed information, please see the recording at 23:50.

What does it cover in basic validations vs all validations? Could you share any reference links or redirect to the appropriate documents link?

When you ask for all validation, it will go to the storage zone and verify each and every table definition. We observed that for a few customers having 2000 tables in the project, it’s going to take almost 20 minutes to an hour. You don’t want to spend that much time while validating the table definition because usually the table definition won’t change very frequently, unless a customer changes something.

When basic validations, it just verifies the metadata and then proceeds to generate ETLs and other stuff so it will take much less time compared with all validation.

There is some documentation on the Help site.

Don’t we have option to choose both - Data Lake and Data Warehouses? For the gen2 version.

The Gen 2 version is the combined version but for the same project, you cannot choose both. But in the same product, you can create two separate projects, one for Data Warehouse and one for Data Lakes.

In C4DL: What is the difference for a given table between the table itself, the live view, the standard view?

The standard view will only give closed partition data but with live view, you will be able to see the partition that is not closed. We don’t recommend using the live view unless you have a requirement because it will utilize a lot of resources. If you run the job every 15 minutes or 30 minutes, you can see all the data in the data lake. You can also see the data from direct queries. If live queries are used, it has to go to a landing zone and read the data so it is going to take time.

We are planning to establish Disaster recovery setup for Qlik compose on Windows VM server. Can anyone please share documents/ articles that can help us understand on Disaster Recovery strategy for such scenarios better?

It’s in the user guide itself, how to set up cluster and other things. So basically, Replicate or Compose will use the underlying Windows cluster enrollment. So, nothing new, if you set up the windows environment perfectly, then the other steps are so easy. The only thing is the binaries, they will install on the C drive but the data folder will choose the shared drive, which is on the sandisk. This will be common for both the windows servers then using windows cluster feature, you can then fail over from one node to another node. Then always, it will be up and running for you to run the Compose jobs. It’s pretty easy and it’s explained in the user guide but if you need any help, you can always create a case for Support and they can help.

Do I need to have a Hadoop cluster installed to work with Compose for Data Lakes?

Yes, you have to use Hive or Databricks. It could be an on-premise Hadoop cluster or it could be Azure, Databricks, what you want but you need some cluster to process the data and store this in the data lake.

Can you source from a Snowflake view using Qlik Compose?

Yes, you can. You’re going to want to make sure you’re using a version of 7 or higher in order to use those views. If you’re trying to see a new view and it’s not showing up in your UI, we do have something called Clear landing cache in the manage ETL sets.

Where are the logs for Compose for Data Warehouse kept?

There are five different locations for the logs within the installation directory. If you do not have access to this path, all the logs are available using the UI as well.

Please refer to the recording at 42:20 for more information on the logs for a detailed explanation on the different logs and using the UI to pull the logs. There is also a reference on the Data Integration forums on Qlik Community.

While creating relationships in Manage Model, when would we NOT select 'Replace existing attributes' on the 'Add Relationship' screen? How is the relationship created when not explicitly selecting the attribute on the associated entity (that also replaces the attribute)?

If we want to create any calculations on particular column then we shouldn't select 'Replace existing attributes' on the 'Add Relationship' screen.

Eg: OrderId will be the relationship column for Order and Order Details table(s). By default compose will replace existing attributes with relationship so HUB table will have only one column. If we need to apply some formula on OrderId column then we shouldn't 'Replace existing attributes' on the 'Add Relationship' so the underlying physical table will have columns for OrderId(one represent relationship and other column represents calculated column).

How can I run multiple Data Mart Post Loading ETL in parallel?

Right now, if you want to add as a post loading ETL, it will run in a serial manner, not in parallel. We generate one ETL document for the data mart, including the post ETL. So, it’s step-by-step.

There is a workaround, if you want to run it as a parallel. So, we have command script and you can place the command script and place this in the post loading ETL. If you have 4 post loading ETLs, then you need to create four command scripts. Then when creating the workflow, you have to add the data mart first then create parallel tasks and then add these four command scripts. In this case, it will run the data mart first then run all four ETLs in parallel and finish the data

To see this feature added to the product, please create an Ideation on this.

Talk to Experts Tuesday - Qlik ComposeFAQ

Talk to Experts Tuesday - Qlik ComposeFAQ

General Question