Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Suggest an Idea

Announcements
This page is no longer in use. To suggest an idea, please visit Browse and Suggest.

Google BigQuery CDC Partition Optimisation To Reduce Cost

junior_favaro
Partner - Contributor III
Partner - Contributor III

Google BigQuery CDC Partition Optimisation To Reduce Cost

Consumption charges for applying change data to BigQuery is based on the size of the underlying data - i.e. BigQuery is unable to update individual rows, so it mutates the entire table/partition in order to apply a change. Each mutation results in the entire underlying table/partition being read, and consumption charges being raised accordingly. For batch applies based on MERGE statements, BigQuery is not intelligent enough to isolate which partitions are affected, and hence mutates the entire tables regardless of partitioning. For a 1TB table, this results in a $5 charge for each batch of changes. In the course of a day's replication, applying changes in 5-minute intervals will result in a $14,000 monthly bill for a single table.

This scenario is largely unavoidable when using SQL statements to apply changes to BigQuery. However, the underlying BigQuery APIs allow individual partitions to be targeted and replaced without, and also allow data to be streamed into tables for scenarios where historic views are desirable. 

Our current customer sees the potential for QR to be improved based on the above, as existing methodology of reducing the cost comes at the compromise of the source and target latency.

Tags (2)
3 Comments
Shelley_Brennan
Former Employee
Former Employee

Thank you for the suggestion.  We have  items in the roadmap including creating clustered and partitioned tables in the target and also adjusting the SQL to take advantage of these options.

Status changed to: Open - On Roadmap
john_wang
Support
Support

Hello @junior_favaro ,

The feature supported by Replicate 2021.5. In the Release Notes:

Support for Google Cloud BigQuery clustered tables

A Create tables as clustered by primary key option has been added to the Advanced tab of the Google Cloud BigQuery target endpoint. When this option is selected, the target tables will be created as clustered (according to the first four Primary Key columns that support clustering). In general, clustered tables usually provide significantly faster query performance as well as reducing billing costs.

 

Thank you,

John.

Shelley_Brennan
Former Employee
Former Employee

As John mentioned, this has been delivered as part of the May 2021 release.  Thank you!

Status changed to: Delivered