Solved: Re: Need understanding on Qlik Replicate Migration - Qlik Community

erric3210 · ‎2023-10-05

Hi Experts,

I'm new to Qlik Replicate and trying to learn the tool & its concepts using community & web.

There is a requirement to migrate data from Oracle database to Snowflake.

The data volume is very big. We have around 10 TB of data and more than 1000 tables.

I've a demo session on Qlik Replicate. There I learned how it works.

So, I need to create Oracle as a source point and Snowflake as Target destination.

But I've few questions:

1): How much data we can migrate from Source (Oracle) to Target (Snowflake)?

2): Does the large volume of data effect the performance on source as well as Target Servers?

3): How much time it will take to transfer 10 TB of data and more than 1000 tables from source to Target?

4): Do I need to create one task for transfer or multiple tasks for it?

5): Can this migration performed by a single resource or does it requires a team effort?

6): Can I perform Full load here?

7): What is the difference between Full load, Apply changes & Store changes?

Please revert with your experienced answers.

Regards,

Eric

Alan_Wang · ‎2023-10-05

1): How much data we can migrate from Source (Oracle) to Target (Snowflake)?

No limit on how much data you can replicate to other targets

2): Does the large volume of data effect the performance on source as well as Target Servers?

A large volume of data will affect performance. Depending on task settings for resource limit allocation, the amount of resources(CPU/memory) of your server, your network speeds, and database read/write speeds, performance will vary. For the Snowflake target, warehouse size can impact performance.

3): How much time it will take to transfer 10 TB of data and more than 1000 tables from source to Target?

Performance varies based on the previous factors. There are no set calculations. Once the task is started, you will get estimates on how long it will take. Estimates are not guaranteed.

4): Do I need to create one task for transfer or multiple tasks for it?

You can create one or multiple tasks depending on your needs. Multiple tasks will allow more customization as task settings can be applied to different tasks with different values based on the set of tables in that particular task. More tasks will mean more management, but any errors can also be isolated to impact only sub-tasks instead of all tables. There's no one-size-fits-all solution and is very much a balancing act depending on your management style.

Log Stream should be used if using a multi-task setup.

5): Can this migration performed by a single resource or does it requires a team effort?

It can be done by a single resource. Most companies will handle the Database side of things separately. There are Database prerequisites for setting up the tasks.

6): Can I perform Full load here?

A Full Load will be needed initially if you want all the data.

7): What is the difference between Full load, Apply changes & Store changes?

Full Load - Select * from base tables to copy all data to target

Apply Changes - Processing of Oracle redo logs to apply CDC to target tables

Store changes - Additional change tables will be created on target for logging individual CDC records that have been applied to the target. Inserts/Updates/Deletes are logged as individual records.

If the issue is solved please mark the answer with Accept as Solution.

View solution in original post

Dana_Baldwin · ‎2023-10-05

Hi @erric3210

In addition to @Alan_Wang 's excellent post - of course it is possible to tune your task (including adjusting the strategic approach to the need) by engaging our Professional Services team. You can contact them via your account manager.

Having said that, there are certain "best practice" settings for a task that uses a Snowflake target that can help as a starting point, adjustments may be needed.

1. For change processing:

On the target endpoint settings, set the file size to the maximum value of 2000 MB. In some cases, based on warehouse size, it may make sense to set it as follows:

XS/S/M/L

500/1000/1500/2000

In general though, Snowflake likes large amounts of data provided rather than small amounts.

In Task Settings > Change Processing > Change Processing Tuning, start with the following settings to increase the size of the batches sent:

Longer than (seconds) to 299
But less than (seconds) to 300
Force apply a batch when memory exceeds to 2000
Transaction offload tuning:
Total transactions memory size exceeds (MB) to 5000
Transactions duration exceeds (seconds) to 86400

These last two settings help to keep changes in memory rather than cached to disk.

Again, your mileage may vary - but this may help as a starting point.

2. For strategies to speed up the full load phase of the task, you might look into using parallel load, details here: Parallel Load | Qlik Replicate Help

Also - Task Settings -> Full Load -> Full Load Tuning

Maximum number of tables: 5
Transaction consistency timeout (seconds): 30
Commit rate during full load: 20000 (If there are no LOB columns - LOBs increase the amount of RAM involved and if you have LOBs this should be evaluated on a case by case basis)

3. Lastly, for increased stability of connection to Snowflake, on the advanced tab of the target endpoint add:

Additional ODBC connection properties:
CLIENT_SESSION_KEEP_ALIVE=true;ABORT_DETACHED_QUERY=true;CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY=900;

Hope this helps get you going!

Dana

View solution in original post

Heinvandenheuvel · ‎2023-10-06

>> I'll keep learning from community & by doing practical work.

That a good attitude. Admirable. Using the community is great, and just doing the practical work will help a lot. Be sure to mostly do quick small tests (just a few simple tables or just 1, a few columns, several datatypes, a couple of rows, to get comfortable. And when you go big dare to kill, tweak, and start again. Once a few thousand rows are loading, nothing much will change. Build one, or several, to throw away... because you will anyway.

Still, if there is an option to work with trainer or consultant from Qlik Professional Services team or otherwise, you should also use that.

>>> Can we add, delete tables from outside script rather than using Replicate where

Yes, using the JSON. Export - Tweak - Import. Use macros in a a smart editor, or use a scripting language you are comfortable with to 'merge' your processed table list with the JSON for the task as entries under "explicit_included_tables": [{.

You could use SQL, but you are probably better of using Python, Powershell, Perl or Awk. I like Perl because it works on Windows as well as Linux.

I'll attach a 'superscript ' that can modify the JSON in Perl and Powershell, and put a bit of Perl below to show how to massage a CSV owner,tablename into a Replicate style include list.

@SushilKumar >> however it is always recommended to user replicate console for Task or Endpoint modification.

I disagree. If you know what you need, and it is more then a few dozen table, you should script it to get a reliable, predictable result. And you possibly want to add script to compare the resulting list with the requested list.

Cheers, Hein.

use strict;
use warnings;
my ($owner, $name, $table, $tables, $i, %tables) = ("", "", "", 0);

print "// Processed by <$0> on ", scalar localtime, "\n";

while (<>) {            # read all
        chop;
        die "Too many words on line : $_" if /,\s/;
        ($owner,$name) = split /\./;
        if (! defined($name) ) {
                $name = $owner;
                $owner = $DEFAULT_OWNER;
        }
        $tables++;
        $tables{uc($owner.'.'.$name)} = $owner.'.'.$name;
}

for (sort keys %tables) {
    ($owner,$name) = split /\./, $tables{$_};
        print qq(\t\t\t\t\t\t\t\t"id":\t0,\n\t\t\t\t\t\t\t\t"owner":\t"$owner",\n),
              qq(\t\t\t\t\t\t\t\t"name":\t"$name",\n\t\t\t\t\t\t\t\t"estimated_size":\t0,\n),
              qq(\t\t\t\t\t\t\t\t"orig_db_id":\t0\n\t\t\t\t\t\t\t}, {\n);
}

print STDERR "\n\n$tables tables processed.\n";

View solution in original post

Alan_Wang · ‎2023-10-05