Qlik Replicate - Reload Kafka Target produces duplicates upon any surrounding Kafka target error
Underlying situation
When performing an initial load ("Reload Target...") with Qlik Replicate towards a Kafka target, it can happen that for one of loaded tables an error on Kafka (target) side occurs. In this situation usually several tables are loading in parallel (default parallelism degree is 5). When for one of these loading tables a target error occurs, then Qlik Replicate aborts the whole task and also the ones which are still loading fine. After this abortion the aborted tables are moved back to the "Queued" list. Several reload retries are done by Qlik Replicate and the same tables are picked out and are trying to be reloaded again and again, while for one of these tables the error on target side happens. Illustrated:
Reload Target - parallelism degree 5
Trial #1
Table 1: loading (no errors)
Table 2: loading (no errors)
Table 3: loading (no errors)
Table 4: loading (no errors)
Table 5: loading - producing a Kafka error
After a few seconds of loading the task aborts.
Trial #2
Table 1: loading (no errors) - producing duplicates
Table 2: loading (no errors) - producing duplicates
Table 3: loading (no errors) - producing duplicates
Table 4: loading (no errors) - producing duplicates
Table 5: loading - producing a Kafka error
After a few seconds of loading the task aborts.
Trial #3... same behavior as #2.. etc.
-> This behavior leads to (several) duplicates in the Kafka target topics for tables 1-4.
Idea description
Qlik Replicate could implement a smarter error handling in this case. In the above mentioned example 4 of 5 tables have no errors and are loading fine. They would finish, if the Kafka problem of table 5 wouldn't abort the whole task after a few seconds. So there would be two options / solution designs:
1) Qlik Replicate finishes the load of the tables, which have no problems, and then aborts the task. In this case Qlik Replicate should not get further tables from the "Queued" list, if an error occured. It should just wait, until the currently loading tables are finished and then abort.
2) Qlik Replicate moves the table with the error to the "Error" list and continues with the other tables. All tables without Kafka target errors can be finished thereby. In the end the task can finish or abort.
Either solution would prevent duplicates during initial load.
Target audience
Qlik Replicate and Kafka Developers.
Additionally also the target system developers who receive the duplicates.
Value proposition
Duplicates lead to
higher storage consumption on Kafka cluster (and eventually further target systems)
higher compute usage for producing and receiving the duplicate messages to/from Kafka Cluster
higher network usage due to more unnecessary data being transferred
additional cleanup effort
in case the Kafka topic has to be recreated and reloaded again to clear the duplicates
in case the final target system doesn't do an upsert (e.g. Kafka Connect -> GCP BigQuery, requires manual cleanup)
etc.
All these negative impacts can be prevented by implementing this idea.
Case reference
This issue was described in case 31305 and an ideation was suggested.
The current behavior for the Kafka target is supporting (at least once) we will investigate the Kafka support for "exactly once" as a feature request. We will continue to analyze the details and report back.
NOTE: Upon clicking this link 2 tabs may open - please feel free to close the one with a login page. If you only see 1 tab with the login page, please try clicking this link first: Authenticate me! then try the link above again. Ensure pop-up blocker is off.