I am new to Qlik Compose and our company just started implementing this and we are in the initial stages. I have below questions and looking for some clarifications on this.
Thanks alot for taking time to read my questions
Hi @mopidesp - welcome to the forum!
Some great questions concerning Compose understanding. Apologies in advance for the long reply, but hopefully this is helpful!
The first set of blue nodes are the loading to TSTG for 3 mappings – once completed the mult-table steps kick in. Once they are completed Single-table can be executed in parallel. After that Compose completes the processing of the data.
If you wish for more info on this – please private message me on here. In some cases these design patterns can enable the use of “CDC” ETL sets for both FULL and CDC based data loads by mimicking the CT tables for full data sets. Which would solve your other question of combining CDC and FULL into a single process 😊
FYI – there are some additional white-papers in the documents and videos section of this forum for Compose for DW - https://community.qlik.com/t5/Qlik-Compose-for-Data-Warehouses-Documents/tkb-p/qlik-compose-warehous...
Question: "I see that we can create full load and CDC tasks separately to load from landing tables to DV. We often have a scenario where we do full load for some tables (low volume tables) and incremental load for high volume tables. Is it possible to configure a task to have a mix of both -- some tables in full load mode and some tables in CDC mode (under a single task) ?"
Comment: You can combine full-load and CDC ETL sets via the Compose workflows. Basically, create your full-load ETL set and add those tables that require a full load strategy. Then, create the CDC ETL set and add those large tables in it. Using the Compose workflows, create a new workflow and add both CDC sets. You can schedule this new workflow based on your needs. You will then accomplish your combined ETL sets - full-load and CDC strategy. If your datamart depends on these ETL sets, make sure you add the datamart task in the workflow right after the execution of your ETL sets.
Question: "When loading high volume tables first time in the full load mode, are there any design patterns (like parallel processing, partitioning, processing subsets of data in multiple runs etc) to improve the performance ?"
Comments: Based on the model, Compose will execute the mappings and the generated code in parallel, so you do not need to worry about it. However, you can add some of the following additional performance tuning options: (a) increase the JVM for each ETL set (if required and if you have the resources for it), (b) using the workflows, you can run ETL sets in parallel, (c) using single/multi table, and post loading ETLs, you can add performance tuning steps such as indexes, collection of stats, and other database options to improve the workloads. You also have two options in the datamart layer to perform pre-load and post-load steps, which can also be used to create partitions, collect statistics, and add other database performance-tuning options.
Question: "Regarding Multi table ETL/Single table ETL, Is there any difference when they get executed ?"
Comment: There are some differences between multi-table and single table ETLs:
a. Multi-table ETL sets are executed before single table ETL sets.
b. Multi-table ETL sets are executed sequentially, one at the time.
c. Single-table ETL sets are executed in parallel, and after multi-table ETL sets.
d. Both, single-table and multi-table ETLs are executed after the TSTG tables are created/populated, and typically these ETL sets are written to select/update, or change data in the TSTG tables.
d. Additionally, you can use post-loading ETL sets to select/update/ or change your data ware house tables, TDWH tables, and post-loading ETL sets are executed after the TDWH tables are populated/modified.
Question: "While doing a CDC load, I can only use “source type” as table whereas in Full load I can use all the three source types. Are there any other limitations/differences when doing CDC vs Full load?"
Comment: The main difference between the CDC and Full Load is that the CDC mode uses the change tables (_ct) tables for consuming and synching changes (typically applied by Replicate). The full load uses the base tables of your landing area to perform the initial load. However, even if you use the full load mode to maintain a warehouse table, if the the full load data has changed in the landing base table, Compose can detect the change, and update your data in the warehouse even if you are using a full load ETL set.
Question: "Once the compose process the data from “_ct” tables, data will be either purged or archived from the “_ct” tables. If I want to reload last 2-3 days of data, Is it possible or is there a way to do this ?"
Comment: Compose has a option in the Database connection layer to allow the user to archive the changed-data into another table or database/schema. You can reload data by copying the data from the archived tables into the _CT tables again.