Solved: Replicate data from last 24 months - Qlik Community

raghujakkani · ‎2024-03-26

Hello,

My Source connection is an Oracle database. I have audit tables in my Replicate task "Full Load + Apply Changes"

the tables are huge and by nature there is heavy DML operations.

My target is Snowflake "Extra Small Size", I see significant amount of latency when i tried replicating audit tables.

Is there a way I can limit the date range to be at most 24 months of history data in the replicate task ?

Thank you.

john_wang · ‎2024-03-26

Hello @raghujakkani ,

Thanks for reaching out to Qlik Community!

If I understood correctly, the last 24 months data meaning history data and change data. it turns out we need filters in both Full Load stage and Change Processing stage.

So far the steps are:

1. If the source is partition table (for example partitioned by year etc) then we may filter the data by partition. if NO, see 2.

2. Filter the history data in Full Load by "Fullload Passthru Filter" (see below 3) and/or "Filter Conditions" (see below 1) and/or "Record Selection Condition" (see below 2)

Let's say there is a column in the tables identify the date, we can use it in the filter. For example from Jan 01 2023, 00:00:00.

3. Filter the change data in Change Processing by using "Filter Conditions" and/or "Record Selection Condition" (see above 1 & 2)

Let's assume all the change rows creation date is after Jan 01 2023, 00:00:00. Otherwise we need to consider more settings to make sure no changes missing.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

DesmondWOO · ‎2024-03-26

Hi @raghujakkani ,

Thank you for reaching out to the Qlik Community.

I am uncertain about your question regarding the "most 24 months of history data". Replicate reads changes from the transaction log and writes them into an audit table. It is a real-time replication. So, those changes should be within 24 months.

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

john_wang · ‎2024-03-26

Hello @raghujakkani ,

Thanks for reaching out to Qlik Community!

If I understood correctly, the last 24 months data meaning history data and change data. it turns out we need filters in both Full Load stage and Change Processing stage.

So far the steps are:

1. If the source is partition table (for example partitioned by year etc) then we may filter the data by partition. if NO, see 2.

2. Filter the history data in Full Load by "Fullload Passthru Filter" (see below 3) and/or "Filter Conditions" (see below 1) and/or "Record Selection Condition" (see below 2)

Let's say there is a column in the tables identify the date, we can use it in the filter. For example from Jan 01 2023, 00:00:00.

3. Filter the change data in Change Processing by using "Filter Conditions" and/or "Record Selection Condition" (see above 1 & 2)

Let's assume all the change rows creation date is after Jan 01 2023, 00:00:00. Otherwise we need to consider more settings to make sure no changes missing.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

raghujakkani · ‎2024-03-27

Thank you @john_wang , I have used "Filter Condition" and "Record Selection Condition" to replicate last two years data. It worked. Attached screenshot which has Expression i used.

Question - Does this keep only two years data at the target ? Or it keeps replicating and appending the data at target.

Eg : a month later, Do I see 2 years and 1 months data at target ?

Heinvandenheuvel · ‎2024-03-27

"Does this keep only two years data at the target ? Or it keeps replicating and appending the data at target. "

That's your choice determined by the run and table-create option.

If you only resume, or use the advance run 'start by time - tables already loaded' then it will just keep adding which is probably what you want, as any target delete is easy and efficient enough with as quick 'delete where date-column < xxx'

If you ever RELOAD then the default is to drop, recreate and therefor you end up with just 2 years again. You have to check the "task settings" -> "full-load" -> "full-load settings" for "Target Table Preparation
If target table already exists:" Default: "Drop and create" but you also have the option "truncate" which also gets the 2 years and "do nothing" which leaves in what is already there - which creates conflicts unless you start by timestamp or pre-delete that last 2 years.

Hein

raghujakkani · ‎2024-03-28

Thank you @Heinvandenheuvel. I left the RELOAD with defualt drop option, which works in my case.

I am curious if there is a way to write post sql statements for target in qlik replicate after task completes.

Ex : after the reload, can we issue a command for target to delete the records older than 2 years ?

Best Regards,

Raghu.

Heinvandenheuvel · ‎2024-03-29

@raghujakkani "I am curious if there is a way to write post sql statements for target in qlik replicate after task completes. "

NO - Not through Replicate.

YES - Through any other method of you choice, even through a job on the Replicate Server, but that would be independent of the Replicate task run action but possibly scheduled from the same batch/at/cron job or task scheduler.

The is no user definable task 'stop action' in Replicate. I suppose you could submit an improvement request for such action as you are not the only one having requested this. However, the thins folks want to do is such action is so wildly that it would be hard to come up with a general purpose solution. Still, one could consider a collection of SQL statements to execute (but how to pass parameters? would it be just once or once per table? Or one could consider activation an OS script - scary.

The only crazy hack I can come up with - would be fun to try albeit not very realistic is to define the task with full-load setting "Primary Key or Unique Index CreationCreate primary key or unique index after full load completes" and use a dedicated target end-point with its own 'syntax' and within that target endpoint syntax highjack the "create_primary_key" turning that into a DELETE WHERE statement. Yikes!

The normal snowflake syntax for "create_primary_key" is :

"ALTER TABLE ${QO}${TABLE_OWNER}${QC}.${QO}${TABLE_NAME}${QC} ADD CONSTRAINT ${QO}${CONSTRAINT_NAME}${QC} PRIMARY KEY ( ${COLUMN_LIST} )",

Hein.

Replicate data from last 24 months

Best Practices

Configuration

Functionality

General Question