In the data integration process from a Salesforce source to a Snowflake target, it's crucial to ensure all records are accurately captured. However, sometimes gaps can occur in Change Data Capture (CDC) operations, resulting in missing records.
In this article, we aim to explore such an issue and guide you through the required troubleshooting steps and eventual solution.
One of the source table in Salesforce is 'WorkOrder,' and discrepancies in data capture were observed in the logs. The root of the problem becomes evident when examining consecutive SELECT queries:
Timestamp: 2023-07-17 13:41:04.251
Query: SELECT count() FROM WorkOrder WHERE SystemModstamp > 2023-07-17T17:24:28.000Z and SystemModstamp < 2023-07-17T17:40:04Z
Timestamp: 2023-07-17 13:56:31.411
Query: SELECT count() FROM WorkOrder WHERE SystemModstamp > 2023-07-17T17:41:04.000Z and SystemModstamp < 2023-07-17T17:55:31Z
Upon closer examination, it's evident that the tracking column 'SystemModstamp' in Query 2, which is "> 2023-07-17T17:41:04.000Z," does not align with the previous query's 'SystemModstamp' value, which was "< 2023-07-17T17:40:04Z."
Consequently, any changes within this time period (SystemModstamp between '>= 2023-07-17T17:40:04Z' and '<= 2023-07-17T17:41:04.000Z') were not captured, leading to missing records.
After a thorough analysis, the root cause of this issue was identified. The transaction consistency timeout of 60 seconds was found to be inadequate. To address this problem effectively, the following steps are recommended:
Analyze Longest Transactions: Salesforce transactions vary in duration. To ensure comprehensive data capture, it's crucial to identify the longest transactions that may exceed the default consistency timeout.
Set Transaction Consistency Timeout: Adjust the "Transaction consistency timeout" value in your CDC configuration to accommodate the longest transactions. This adjustment ensures that CDC operations span the entire duration of transactions, preventing data gaps and missing records.
To set the Transaction Consistency Timeout:
Open the Salesforce Source Endpoint
Navigate to the Advanced tab
Locate Transaction consistency timeout (sec): and set the value accordingly
By following these steps, you can enhance the accuracy and completeness of your data capture process when loading data from a Salesforce source to a Snowflake target. This proactive approach will help prevent future instances of missing records and maintain the integrity of your data integration pipeline.