Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We have a table that is over 1.5 million rows. This table is replicating from MYSQL to Snowflake via a CDC task. We have to increase the LOB limit because the json values are being truncated. What are some alternatives to repopulating the historical data without having to do a full reload.
Hello @Kerri ,
I'm not entirely sure I understood your question correctly. If I got it right, you're saying that the initial Full Load was completed using a specific LOB size setting. Now you want to increase the LOB size in the Replicate task settings and replicate the LOB columns again with this new (larger) size. Is that correct?
If so, here are two possible options to proceed:
Replicate only the primary key and LOB columns (with the updated, larger LOB size) to a temporary table, then merge the LOB columns from the temporary table into the final target table within Snowflake.
Note: This approach can be complex and may not guarantee full consistency, since the two tasks (original and temporary) run asynchronously.
Reload the original task with the new, larger LOB size setting.
This is the safer and more reliable option for data integrity.
You can also enable Parallel Load to accelerate the initial load process if the dataset is large.
Hope this helps.
John.
Hello John!
Yes, you understood the question correctly. Due to the size of the table, a full reload of that table would take several days, which is what we are trying to avoid, therefore reloading the original task isn't really an option.
Option 1 does sound like something we could try, although your point of it being more complex makes me nervous.
We currently have the full load tuning to 8 for this dataset.
Hello @Kerri ,
Thanks for the update.
Yes, neither option is ideal for such a large dataset. Parallel Load might be the more manageable approach, even though it takes time and consumes significant resources.
Best Regards,
John.
Hello again John,
What is the most conservative approach for this setting? We currently have the value set to 8. I don't want to negatively impact the other tasks nor bring down MYSQL.
Thanks in advance