Regarding Replication task

harsh2 · ‎2024-02-06

Hi Qlik Team,

I just want to ask:

How can I create a replication task specifically from MSSQL as the source to Google Big Query as the target? When I choose the replication task, I notice there are only four supported targets available.

Best Regards,

Harsh

TimGarrod · ‎2024-02-06

Hi, in Qlik Cloud Data Integration there are 2 project types - Replication and Data Pipeline.

Currently replication projects and tasks support delivery to cloud object stores or relational RDBMS environments.

When we deliver to a cloud data warehouse / lakehouse solution like BigQuery, Snowflake, Databricks etc. we use the Data Pipeline project. This leverages a 2 part process (Landing and Storage) to ingest and process data into your target (in this case BigQuery).

In order to create the process Add an Onboarding task into your pipeline project.

The benefit of the Landing > Storage architecture is to reduce the churn on the cloud warehouse environments by supporting a delayed merge architecture. Landing provides insert only semantics, tracking all the changes to the source system.
Storage then allows you to create an ODS (copy of the data) and optionally and HDS (type 2 of your data). Storage also provides a feature called Live Views. Live views provide the ability to query data that has been landed but not yet processed into the storage layer thereby reducing teh number of MERGE processes that have to be run in BigQuery, but also providing low-latency access to the data.

This is described in the help guide - https://help.qlik.com/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub/DataIntegration/Onboardi...

harsh2 · ‎2024-02-06

Hi @TimGarrod ,

Thanks for your response. That's helpful.

However, with HDS+ODS, it seems like it will create too many copies of that single table. What if the client's requirement is only to mirror data ? Can't we use the apply changes method like Qlik Replicate?

Best regards,

Harsh

TimGarrod · ‎2024-02-06

You don't have to have an HDS. That is entirely optional and can be configured at the task level or changed / overridden for each individual dataset.
The benefit of Landing > Storage instead of Replicate based apply is both cost / performance with the delayed merge. Additionally reloads don't truncate the target dataset - the reload is handled in the landing zone so as to not impact data availability during the reload event.

General Question

New to Qlik Cloud Data Integration

SaaS