Qlik Replicate Best Practices for Snowflake Cloud Services

John_Neal · Jan 20, 2022 9:53:57 AM

Qlik is a strategic partner for Snowflake and we are proud to have been named by Snowflake as the 2020 Partner of the Year for Data Engineering. The Qlik Data Integration platform is a market leader when it comes to continuous and real-time data ingestion from a wide range of enterprise data sources and we work closely with Snowflake to provide an aligned and optimized solution. Following the changes Snowflake introduced to its Cloud Services consumption billing in 2020, we realized this may have an impact on our joint customers. To get ahead of this, we embarked on an effort to gain detailed insight into this new billing model, how the use of Qlik Replicate could impact a customer’s Snowflake bill, and what changes or guidance we need to provide to help customers optimize their cost/performance.

Background
Snowflake has historically billed its customers based on consumption of data warehouse “compute” resources, along with a fee for storage. Not billed for were the computational resources consumed when parsing a SQL statement, generating an execution plan, etc. These non-warehouse computational resources that are expended are what Snowflake calls “Cloud Services”.

In the Fall of 2019, Snowflake announced that it would be adjusting its billing model and begin charging customers for Cloud Services (also sometimes called “GS” or “Global Services”). This Snowflake blog post provides some background on these changes to the billing model.

When you net things out, virtually every interaction you might have with Snowflake involves execution of a SQL statement of some sort, which in turn would result in consumption of Cloud Services resources. As Qlik Replicate can deliver very high volumes of data into Snowflake in real-time, it was important for us to understand how Replicate’s delivery of data into Snowflake would impact a customer’s Snowflake bill.

Our Research
We set out to run a series of benchmark tests to help us:

Improve our understanding of Snowflake’s Cloud Services
Understand Cloud Services credit consumption by different types of SQL statements
Measure objectively the impact of Qlik Replicate on Cloud Services consumption
Identify Qlik Replicate tuning guidelines that would impact Cloud Services consumption
Identify product optimization opportunities

Our test environment was isolated: no other users were accessing the Snowflake database and warehouse that were used in testing. Our tests involved running a consistent and repeatable source database load that contained a mixture of INSERT, UPDATE, and DELETE operations, using Qlik Replicate to deliver the change data into Snowflake. Each test ran for a little more than two-and-one-half hours and generated more than 6 million CDC records, using different combinations of Replicate tuning parameters. The load we generated was not a large load by any means, but it was enough for us to draw some important conclusions.

Customer Data
As a part of our study, we also worked with several customers to examine their production environments. We looked at the configuration of all their production Qlik Replicate tasks to understand how they had Qlik Replicate configured, and we analyzed over 3 million SQL statements generated by Qlik Replicate in those environments to understand their respective Cloud Services resource consumption. Our analysis included both DML statements (Data Manipulation Language – statements that modify or otherwise access data stored in the database) and DDL statements (Data Definition Language – statements that perform administrative functions, define the structure of tables and indexes … essentially all statements that aren’t DML).

Conclusions
We learned quite a bit during our testing. Our key takeaways were:

All SQL statements result in Cloud Services resource consumption.
DML rather than DDL statements are responsible for a higher percentage of Cloud Services charges. The reasons for this are twofold:
- Parsing and creation of an execution plan tends to be more involved for DML
- Over time, there tend to be far more DML than DDL operations.
As DML execution had the highest impact on Cloud Services consumption, we found that we could significantly lower that consumption by configuring Qlik Replicate to deliver larger batch sizes, accepting a small degree of increased latency in exchange for reduced cost.

Our findings from our internal benchmarks were substantiated by what we learned from the customers we worked with. We found in all cases that DML operations resulted in the bulk of Cloud Services resource consumption; and the customers consistently had Qlik Replicate configured to deliver the data in near-real time, leaving considerable room to tune Qlik Replicate to deliver larger batches at a reduced Cloud Services cost. Each customer would need to find the balancing point between acceptable latency and cost.

For details on the results of our research, see the whitepaper below.

ranjithreddy · ‎2020-09-23

Thanks for the article @John_Neal .

What would be the Best batch Tuning Settings for Snowflake? If a task is configured for CDC (store changes __CT tables) .

What would be min and max values to optimize credits usage on snowflake if a warehouse is not running always and auto suspends after 10 Mins ?

John_Neal · ‎2020-09-23

@ranjithreddy Personal opinion here, but IMO there isn't a "best" setting recommendation I can make. It all depends on your environment and your tolerance for latency ...

As mentioned in the article,

Larger batch sizes / longer windows will result in reduced cloud services consumption, but at the expense of increased latency between your source and Snowflake.
Smaller batch sizes / shorter windows will result in more real-time performance, but at the expense of increased cost.

If you are amenable to more latency, in your case I might configure

a Max file size size of 1000 MB or larger (err toward larger if your load is higher) ... what you want is for the file to not "fill" prior to being inside the "batched changes" window except perhaps at peak load on your source.
Set the "Apply batched changes" intervals to 3 minutes for "longer than", 5 minutes for "but less than"

and then evaluate and adjust as desired from there.

As far as your auto suspend being set at 10 minutes, so long as you have it set to auto resume as well the setting really doesn't enter into the conversation (once again IMO).

TimGarrod · ‎2020-09-29

Excellent analysis @John_Neal !

@ranjithreddy - as John says the auto-suspend doesn't impact the cloud services cost. However, be aware it could impact overall performance of the Replicate apply processes.

Snowflake compute warehouses are not guaranteed to provide the same physical resources (VM's under the covers) on resumption. That means that the compute layer must refresh its local data cache for the objects your are working with from the central storage that Snowflake leverages. So, if you have large volumes of updates occurring, the re-caching of your data in the computer (warehouse) layer may impact performance.

(I've seen some workloads where its had an impact, and others where it hasn't - once again there are alot of variables wrt micropartitions etc.).

Qlik Replicate Best Practices for Snowflake Cloud Services

Qlik Replicate Best Practices for Snowflake Cloud Services

Deployment