Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Support,
We are seeing an issue where Replicate reports an error with the following message when writing to Databricks (Delta): "RetCode: SQL_ERROR SqlState: 08S01 NativeError: 124 Message: [Simba][Hardy] (124) A 503 response was returned but no Retry-After header was provided. Original error: Unknown".
The timestamps of these errors match up to timestamps on the Databricks side when the Databricks cluster was being resized as a result of auto-scaling. However, the Databricks (Delta) endpoint limitations don't have any mention of auto-scaling/cluster resizing being unsupported.
Is this a known issue when using the Databricks (Delta) endpoint with auto-scaling enabled? If so, is there a workaround that can be implemented in Replicate to prevent the error occurring when the Databricks cluster is being resized?
Thanks,
Nak
Hello @NakulanR ,
If the connection issues in Databricks are due to auto-scaling, you can increase the wait period for executions by setting the internal parameters loadTimeout, executeTimeout/
CDCTimeout
to 10 times their current values. This adjustment helps prevent timeouts during scaling operations.
Hope this helps.
Regards,
Sachin B
Hello @NakulanR
Hope below link may help .
https://docs.databricks.com/api/workspace/clusters/resize
Regards,
Sushil Kumar
Hello @NakulanR ,
Thanks for contacting Qlik community forum.
Based on the provided error message "A 503 response was returned but no Retry-After header was provided" means that the target server was temporarily unavailable. This could be due to a number of reasons, such as the server being overloaded or under maintenance.
Can you validate that the there is no connection related issues to your Databricks? Like, uploading csv from another server.
Can you try pinging the databricks server from the Replicate server and see if anything gets to it?
Here is the explanation for the error. This were returned by the Databricks cluster, This needs to be verified by the Databricks team.
Sachin B
Hi Sachin,
The error appears and as a result the endpoint gets disconnected. A few minutes later the endpoint gets reconnected on its own, and the task is back up and running. We are able to determine that this occurs when the auto-scaling resizes the cluster. Testing the connection normally to Databricks yields a successful test connection.
If this is occurring as a result of some sort of timeout disconnect on Databricks whilst the auto-scaling is happening, would using the loadTimeout or executeTimeout internal parameters be of any use? Or is there a Databricks specific internal parameter that can be used?
Regards,
Nak
Hello @NakulanR ,
If the connection issues in Databricks are due to auto-scaling, you can increase the wait period for executions by setting the internal parameters loadTimeout, executeTimeout/
CDCTimeout
to 10 times their current values. This adjustment helps prevent timeouts during scaling operations.
Hope this helps.
Regards,
Sachin B