Archival of Change Table (CT) data for large table... - Qlik Community

SanjayQlik

Hello All, Qlik Support,

Looking for any suggestion to archive (backup and delete) data from the large Change Tables. Source= DB2 and Target= Aurora Postgres.

Due to high CDC (Change Data Capture) at source, the CT's are having millions of records on daily basis for few tables having millions/billions of records. We have developed a custom solution (AWS Glue, PythonShell) to store the old CT records into S3 and then delete those records from CT.

However, this backup and delete is slow process - it is taking multiple minutes for <1M records. (Note: we had to do this in small number of records, because the large number is causing failure of job. Also, it was advised to delete in small numbers instead of huge number of records at once.)

Another suggestion was advised to use Partitioned tables in Postgres. Because dropping the partitions is faster. However, it may impact the replication. Or, will it not?

Any other solutions or suggestion please.

Thanks in advance!

john_wang

Hello @SanjayQlik ,

Thanks for the posting here.

You are correct. You may use one of the following approaches:

Delete rows in small batches
- Never delete billions of rows in a single transaction.
- Perform batch deletes in chunks (e.g., 10k–100k rows per transaction).
- Note: These deleted rows will be recorded in the WAL.
Detach/Drop partitions
- If the table is partitioned by date (e.g., daily), detaching or dropping a partition is much faster than row-level deletes.
- In this scenario, rows are not recorded in the WAL.

If the PostgreSQL table is not a replication source endpoint, these operations should not impact replication.

We strongly recommend performing a thorough acceptance test before applying these changes to any production systems.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

SanjayQlik

Hi @john_wang ,

Thanks much for the response and details.

Wanted to clarify that the tables in question are Change Tables (CT) and not the actual data tables. The endpoint target table is having large datasets and CDC also high, so CT is filling up fast.

As you know, these CT's are created and maintained by Qlik Replicate (erstwhile Attunity). Please clarify whether it is defined as a replication endpoint and that it can have impact on replication if we add partitioning and when we drop partitions or delete data.

In either of the cases, we will test.

If there can be impact, please advise alternatives and recommendations from product side.

john_wang

Hello @SanjayQlik ,

The Change Tables (__ct) are used to store change records and it will be consumed by your other apps/tools, it's not defined as a replication endpoint.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

SanjayQlik

Thanks again!

In this case (*__ct not defined as replication endpoint), we should be able to update or recreate it with partitions, and there won't be any impact on the replication task.

Please correct me if my understanding is wrong.

john_wang

Hello @SanjayQlik ,

If the target endpoint is PostgreSQL, please note that partitioned tables are currently not supported in this version. It's supported in Hadoop and other warehouse endpoints.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Archival of Change Table (CT) data for large tables in Postgres

Best Practices

Configuration

Performance