Suggest an Idea

George_Thomas · ‎2020-06-24

Replicate Target: Google BigQuery.

Type: Enhancement

Enable the creation of tables with clustering that matches the primary key of the source table.

Custering of the table but GBQ does not support primary key so the question is if there is a way to better organize the table to allow for quicker retrieval of the rows during UPDATE and DELETEs.

Posting on Behalf of an Attuninty Customer who does not yet have access to the New Idea form in the community

George Thomas
To help users find verified answers, please don't forget to mark a correct resolution or answer to your problem or question as correct.

JitenderR · ‎2020-06-24

@George This is already requested by one of my customers as well and should be part of our Nov/Dec release (to be confirmed by PM). currently you should be able to work with PS/R&D and update the provider syntax file to enable this option with few limitations.

Regards

Jitender Renwa

dcostanakano · ‎2020-07-02

@jrenwa144 , can you provide more details on this being part of the Nov/Dec release? I have tried to search the idea forums and cannot find any items that speaks to this issue. Or is there a product roadmap that I can review and verify this?

Further, you wrote "currently you should be able to work with PS/R&D and update the provider syntax file to enable this option with few limitations. " I was asked by Professional Services to have this enhancement requested entered here. Are you saying that they *could* have provided me with a workaround? Can you provide me more details and specifics for a workaround?

Please advise.

Many thanks,

@dcostanakano

JitenderR · ‎2020-07-02

@dcostanakano Please reach out to PM via your CSM for a roadmap review session. And yes, Professional Services is correct to route you to put the idea here so that it gets PM's attention and also gives us an insight on a specific request.

Kindly understand that the ideas page is to share only "ideas" (per my understanding) and you will have to reach out to PS/SME to ensure this is the correct workaround for your environment/use case. Also note that provider syntax file controls the behavior of an endpoint and should not be updated without PS/Support/SME/R&D consultation as it can lead to other issues if not tested properly.

Below is how to update it for this specific case and is provided as FYI only. It has a few limitations, mentioned below, to be fixed in future releases and made part of the product,

-> Control tables viz., attrep_apply_exceptions etc that do not have primary key will needs to be created manually in BQ environment

-> If source table has more than 4 PK columns, this workaround fails as it puts CLUSTER BY on all columns and BQ has a limitation of 4 columns. separate out all tables with more than 4 PK columns in a new task as a workaround.

Update provider syntax file:
####################################
1) Run Replicate command line as ADMINISTRATOR
2) change directory to cd C:\Program Files\Attunity\Replicate\bin
3) Execute the below command - in this case the output JSON file is copied to C:\work folder
C:\Program Files\Attunity\Replicate\bin>repctl getprovidersyntax syntax_name=BigQuery > C:\work\BigQueryPartition.json
4) Open the file in a text editor, Notepad++ preferably
5) Add the below line after "truncate_table" section
"create_table": "CREATE TABLE ${QO}${TABLE_OWNER}${QC}.${QO}${TABLE_NAME}${QC} (${COLUMN_LIST} ) PARTITION BY _PARTITIONDATE CLUSTER BY ${PK_COLUMN_LIST}",
6) Remove the below 3 lines from the file
{
"provider_syntax": {
"name": "BigQuery",
"query_syntax": {
7) Replace these 4 lines with below and save the file
{

"name":"BigQueryPartition",
"repository.provider_syntax": {
"name": "BigQueryPartition",
"query_syntax": {
😎 Look for any header/trailers in JSON file that might show up as a result of export. Delete if any
9) Use a JSON schema validator editor like one below to ensure JSON is valid
https://www.jsonschemavalidator.net/
10) Now copy the file to <Replicate_install_folder>/data/imports/BigQueryPartition.json
11) Navigate back to replicate command line and execute the below repctl command; sample expected output shown below
C:\Program Files\Attunity\Replicate\bin>repctl putobject data=BigQueryPartition
[putobject command] Succeeded
12) Create a sample task with BQ target and add the internal parameter, if not present already
syntax = BigQueryPartition

Regards

JR

Ola_Mayer · ‎2020-07-13

Shelley_Brennan · ‎2021-05-12

With the Replicate May 2021 release, we now support creating the target tables in BigQuery as clustered tables and SQL is optimized as well to take advantage of this. Thanks!