Qlik Replicate windows server was rebooted abruptly
Problem Statement: Due to abrupt restart of replicate services or windows server reboot (planned vs unplanned), the replicate task writing to databricks endpoint is abruptly ended and as a result, partial or corrupt seq files are being written to databricks tables. When windows server comes up and Replicate services are started, replicate task continues to run from where it left. The problem/behavior we currently see is, Replicate task continues to process and append new seq files to the databricks table WITHOUT COMPLAINING.
Impact: As replicate is NOT complaining for any issues, the task continues to load/process new data, until our Business Teams or Compose Jobs start failing. At that point, we need to identify “manually” via business inputs or compose team inputs each and every table that was corrupted and rectify the data by cleaning the corrupt files/partitions and then process the data from the point of corruption.
Ask: While we understand that databricks is a file based target and abrupt shutdown of replicate services or windows server can result in corruption, we need Replicate to TELL US or identify the corrupted tasks so that data processing can be STOPPED from the time of corruption. At this point we are completely relying on our business or compose teams to let us know the data corruption and hence want product to handle this instead of relying on downstreams.
Possible Solutions: Due to abrupt shutdown, replicate might or might not have received the “completion signal” from databricks for the file it was processing. Replicate stores this information internally and upon task initialization, do NOT allow the task to be resumed and spits out a message saying ‘task stopped abnormally and might have corrupt tables’. This will at least narrow down to a few tables/tasks. But again, the ask is to get full information via the product!!
NOTE: Upon clicking this link 2 tabs may open - please feel free to close the one with a login page. If you only see 1 tab with the login page, please try clicking this link first: Authenticate me! then try the link above again. Ensure pop-up blocker is off.