Skip to main content

Suggest an Idea

Vote for your favorite Qlik product ideas and add your own suggestions.

Big Query endpoint: Create Streaming Capabilities

joseph_jbh
Contributor III
Contributor III

Big Query endpoint: Create Streaming Capabilities

Google is telling us that the current Qlik implementation of its Big Query endpoint uses “micro-batching” to get data into Big Query.  While that does work, it does run afoul of a couple of quotas, limiting the number of batches a day, and the number of tables/partitions in a project. 

Can you create another version of the endpoint that utilizes their latest streaming ingestion, using their new Write API?  Here are a couple of references:

https://cloud.google.com/bigquery/docs/write-api

BigQuery Write API explained: An overview of the Write API | Google Cloud Blog

Some of the advantages:

Exactly-once delivery semantics. The Storage Write API supports exactly-once semantics through the use of stream offsets. Unlike the tabledata.insertAll method, the Storage Write API never writes two messages that have the same offset within a stream, if the client provides stream offsets when appending records.

Stream-level transactions. You can write data to a stream and commit the data as a single transaction. If the commit operation fails, you can safely retry the operation.

Transactions across streams. Multiple workers can create their own streams to process data independently. When all the workers have finished, you can commit all of the streams as a transaction.

Efficient protocol. The Storage Write API is more efficient than the older insertAll method because it uses gRPC streaming rather than REST over HTTP. The Storage Write API also supports binary formats in the form of protocol buffers, which are a more efficient wire format than JSON. Write requests are asynchronous with guaranteed ordering.

Schema update detection. If the underlying table schema changes while the client is streaming, then the Storage Write API notifies the client. The client can decide whether to reconnect using the updated schema, or continue to write to the existing connection.

Lower cost. The Storage Write API has a significantly lower cost than the older insertAll streaming API. In addition, you can ingest up to 2 TB per month for free.

 

Thank you!

 

Tags (2)
7 Comments
Bayu
Contributor
Contributor

In our POC with google, we're using qlik replicate to push to kafka into 2 different topics. One topic is to store the cdc records and the other is to store schema changes. The format of the data is stored in json format. From there we have a Dataflow (google manage product of apache beam) that read the json format and insert data to BQ table. Again the purpose of the Dataflow is to reads cdc records from Kafka (json format) and reformat and appends records to BQ tables.

In this ideation, we're looking if Qlik can create a new end point for BQ where it can remove the need for Kafka and Dataflow. The process itself only involves append process (no need to actually apply insert/update/delete in BQ). Once the data is stored chronologically in BQ, my team can build a view that will make the user see only the latest data.

Currently the Dataflow is also creating the view... so ideally we would like Qlik also handles the view creation.

bobvecchione
Employee
Employee

Hey Joe - We do in fact have this request on our radar  for Replicate and our Qlik Cloud DI. It is a relatively high priority, but we have not started the work yet.

 

You can always ping me directly (you have my email) 🙂 and I will keep you up to date.

 

Thanks

 

--bobv--

joseph_jbh
Contributor III
Contributor III

Excellent news....Thanks for the update, @bobvecchione !

marina_marshak
Employee
Employee

Hello Joseph,

Wanted to update you that we did initial evaluation of this feature with the R&D and we have added it to our roadmap.

It's important item both for Replicate and Qlik cloud Data Integration, but it's not something we will be targeting in the short term, as this is not a straightforward implementation, and we are currently tied with other engagements. 

marina_marshak
Employee
Employee
 
Status changed to: Open - On Roadmap
Meghann_MacDonald

From now on, please track this idea from the Ideation portal. 

Link to new idea

Meghann

NOTE: Upon clicking this link 2 tabs may open - please feel free to close the one with a login page. If you only see 1 tab with the login page, please try clicking this link first: Authenticate me! then try the link above again. Ensure pop-up blocker is off.

Ideation
Explorer II
Explorer II
 
Status changed to: Closed - Archived