Google is telling us that the current Qlik implementation of its Big Query endpoint uses “micro-batching” to get data into Big Query. While that does work, it does run afoul of a couple of quotas, limiting the number of batches a day, and the number of tables/partitions in a project.
Can you create another version of the endpoint that utilizes their latest streaming ingestion, using their new Write API? Here are a couple of references:
https://cloud.google.com/bigquery/docs/write-api
BigQuery Write API explained: An overview of the Write API | Google Cloud Blog
Some of the advantages:
Exactly-once delivery semantics. The Storage Write API supports exactly-once semantics through the use of stream offsets. Unlike the tabledata.insertAll
method, the Storage Write API never writes two messages that have the same offset within a stream, if the client provides stream offsets when appending records.
Stream-level transactions. You can write data to a stream and commit the data as a single transaction. If the commit operation fails, you can safely retry the operation.
Transactions across streams. Multiple workers can create their own streams to process data independently. When all the workers have finished, you can commit all of the streams as a transaction.
Efficient protocol. The Storage Write API is more efficient than the older insertAll
method because it uses gRPC streaming rather than REST over HTTP. The Storage Write API also supports binary formats in the form of protocol buffers, which are a more efficient wire format than JSON. Write requests are asynchronous with guaranteed ordering.
Schema update detection. If the underlying table schema changes while the client is streaming, then the Storage Write API notifies the client. The client can decide whether to reconnect using the updated schema, or continue to write to the existing connection.
Lower cost. The Storage Write API has a significantly lower cost than the older insertAll
streaming API. In addition, you can ingest up to 2 TB per month for free.
Thank you!