Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Experts,
I'm new to Qlik Replicate and trying to learn the tool & its concepts using community & web.
There is a requirement to migrate data from Oracle database to Snowflake.
The data volume is very big. We have around 10 TB of data and more than 1000 tables.
I've a demo session on Qlik Replicate. There I learned how it works.
So, I need to create Oracle as a source point and Snowflake as Target destination.
But I've few questions:
1): How much data we can migrate from Source (Oracle) to Target (Snowflake)?
2): Does the large volume of data effect the performance on source as well as Target Servers?
3): How much time it will take to transfer 10 TB of data and more than 1000 tables from source to Target?
4): Do I need to create one task for transfer or multiple tasks for it?
5): Can this migration performed by a single resource or does it requires a team effort?
6): Can I perform Full load here?
7): What is the difference between Full load, Apply changes & Store changes?
Please revert with your experienced answers.
Regards,
Eric
1): How much data we can migrate from Source (Oracle) to Target (Snowflake)?
No limit on how much data you can replicate to other targets
2): Does the large volume of data effect the performance on source as well as Target Servers?
A large volume of data will affect performance. Depending on task settings for resource limit allocation, the amount of resources(CPU/memory) of your server, your network speeds, and database read/write speeds, performance will vary. For the Snowflake target, warehouse size can impact performance.
3): How much time it will take to transfer 10 TB of data and more than 1000 tables from source to Target?
Performance varies based on the previous factors. There are no set calculations. Once the task is started, you will get estimates on how long it will take. Estimates are not guaranteed.
4): Do I need to create one task for transfer or multiple tasks for it?
You can create one or multiple tasks depending on your needs. Multiple tasks will allow more customization as task settings can be applied to different tasks with different values based on the set of tables in that particular task. More tasks will mean more management, but any errors can also be isolated to impact only sub-tasks instead of all tables. There's no one-size-fits-all solution and is very much a balancing act depending on your management style.
Log Stream should be used if using a multi-task setup.
5): Can this migration performed by a single resource or does it requires a team effort?
It can be done by a single resource. Most companies will handle the Database side of things separately. There are Database prerequisites for setting up the tasks.
6): Can I perform Full load here?
A Full Load will be needed initially if you want all the data.
7): What is the difference between Full load, Apply changes & Store changes?
Full Load - Select * from base tables to copy all data to target
Apply Changes - Processing of Oracle redo logs to apply CDC to target tables
Store changes - Additional change tables will be created on target for logging individual CDC records that have been applied to the target. Inserts/Updates/Deletes are logged as individual records.
Hi @erric3210
In addition to @Alan_Wang 's excellent post - of course it is possible to tune your task (including adjusting the strategic approach to the need) by engaging our Professional Services team. You can contact them via your account manager.
Having said that, there are certain "best practice" settings for a task that uses a Snowflake target that can help as a starting point, adjustments may be needed.
1. For change processing:
On the target endpoint settings, set the file size to the maximum value of 2000 MB. In some cases, based on warehouse size, it may make sense to set it as follows:
XS/S/M/L
500/1000/1500/2000
In general though, Snowflake likes large amounts of data provided rather than small amounts.
In Task Settings > Change Processing > Change Processing Tuning, start with the following settings to increase the size of the batches sent:
Longer than (seconds) to 299
But less than (seconds) to 300
Force apply a batch when memory exceeds to 2000
Transaction offload tuning:
Total transactions memory size exceeds (MB) to 5000
Transactions duration exceeds (seconds) to 86400
These last two settings help to keep changes in memory rather than cached to disk.
Again, your mileage may vary - but this may help as a starting point.
2. For strategies to speed up the full load phase of the task, you might look into using parallel load, details here: Parallel Load | Qlik Replicate Help
Also - Task Settings -> Full Load -> Full Load Tuning
Maximum number of tables: 5
Transaction consistency timeout (seconds): 30
Commit rate during full load: 20000 (If there are no LOB columns - LOBs increase the amount of RAM involved and if you have LOBs this should be evaluated on a case by case basis)
3. Lastly, for increased stability of connection to Snowflake, on the advanced tab of the target endpoint add:
Additional ODBC connection properties:
CLIENT_SESSION_KEEP_ALIVE=true;ABORT_DETACHED_QUERY=true;CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY=900;
Hope this helps get you going!
Dana
>> I'll keep learning from community & by doing practical work.
That a good attitude. Admirable. Using the community is great, and just doing the practical work will help a lot. Be sure to mostly do quick small tests (just a few simple tables or just 1, a few columns, several datatypes, a couple of rows, to get comfortable. And when you go big dare to kill, tweak, and start again. Once a few thousand rows are loading, nothing much will change. Build one, or several, to throw away... because you will anyway.
Still, if there is an option to work with trainer or consultant from Qlik Professional Services team or otherwise, you should also use that.
>>> Can we add, delete tables from outside script rather than using Replicate where
Yes, using the JSON. Export - Tweak - Import. Use macros in a a smart editor, or use a scripting language you are comfortable with to 'merge' your processed table list with the JSON for the task as entries under "explicit_included_tables": [{.
You could use SQL, but you are probably better of using Python, Powershell, Perl or Awk. I like Perl because it works on Windows as well as Linux.
I'll attach a 'superscript ' that can modify the JSON in Perl and Powershell, and put a bit of Perl below to show how to massage a CSV owner,tablename into a Replicate style include list.
@SushilKumar >> however it is always recommended to user replicate console for Task or Endpoint modification.
I disagree. If you know what you need, and it is more then a few dozen table, you should script it to get a reliable, predictable result. And you possibly want to add script to compare the resulting list with the requested list.
Cheers, Hein.
use strict;
use warnings;
my ($owner, $name, $table, $tables, $i, %tables) = ("", "", "", 0);
print "// Processed by <$0> on ", scalar localtime, "\n";
while (<>) { # read all
chop;
die "Too many words on line : $_" if /,\s/;
($owner,$name) = split /\./;
if (! defined($name) ) {
$name = $owner;
$owner = $DEFAULT_OWNER;
}
$tables++;
$tables{uc($owner.'.'.$name)} = $owner.'.'.$name;
}
for (sort keys %tables) {
($owner,$name) = split /\./, $tables{$_};
print qq(\t\t\t\t\t\t\t\t"id":\t0,\n\t\t\t\t\t\t\t\t"owner":\t"$owner",\n),
qq(\t\t\t\t\t\t\t\t"name":\t"$name",\n\t\t\t\t\t\t\t\t"estimated_size":\t0,\n),
qq(\t\t\t\t\t\t\t\t"orig_db_id":\t0\n\t\t\t\t\t\t\t}, {\n);
}
print STDERR "\n\n$tables tables processed.\n";
1): How much data we can migrate from Source (Oracle) to Target (Snowflake)?
No limit on how much data you can replicate to other targets
2): Does the large volume of data effect the performance on source as well as Target Servers?
A large volume of data will affect performance. Depending on task settings for resource limit allocation, the amount of resources(CPU/memory) of your server, your network speeds, and database read/write speeds, performance will vary. For the Snowflake target, warehouse size can impact performance.
3): How much time it will take to transfer 10 TB of data and more than 1000 tables from source to Target?
Performance varies based on the previous factors. There are no set calculations. Once the task is started, you will get estimates on how long it will take. Estimates are not guaranteed.
4): Do I need to create one task for transfer or multiple tasks for it?
You can create one or multiple tasks depending on your needs. Multiple tasks will allow more customization as task settings can be applied to different tasks with different values based on the set of tables in that particular task. More tasks will mean more management, but any errors can also be isolated to impact only sub-tasks instead of all tables. There's no one-size-fits-all solution and is very much a balancing act depending on your management style.
Log Stream should be used if using a multi-task setup.
5): Can this migration performed by a single resource or does it requires a team effort?
It can be done by a single resource. Most companies will handle the Database side of things separately. There are Database prerequisites for setting up the tasks.
6): Can I perform Full load here?
A Full Load will be needed initially if you want all the data.
7): What is the difference between Full load, Apply changes & Store changes?
Full Load - Select * from base tables to copy all data to target
Apply Changes - Processing of Oracle redo logs to apply CDC to target tables
Store changes - Additional change tables will be created on target for logging individual CDC records that have been applied to the target. Inserts/Updates/Deletes are logged as individual records.
Hi @erric3210
In addition to @Alan_Wang 's excellent post - of course it is possible to tune your task (including adjusting the strategic approach to the need) by engaging our Professional Services team. You can contact them via your account manager.
Having said that, there are certain "best practice" settings for a task that uses a Snowflake target that can help as a starting point, adjustments may be needed.
1. For change processing:
On the target endpoint settings, set the file size to the maximum value of 2000 MB. In some cases, based on warehouse size, it may make sense to set it as follows:
XS/S/M/L
500/1000/1500/2000
In general though, Snowflake likes large amounts of data provided rather than small amounts.
In Task Settings > Change Processing > Change Processing Tuning, start with the following settings to increase the size of the batches sent:
Longer than (seconds) to 299
But less than (seconds) to 300
Force apply a batch when memory exceeds to 2000
Transaction offload tuning:
Total transactions memory size exceeds (MB) to 5000
Transactions duration exceeds (seconds) to 86400
These last two settings help to keep changes in memory rather than cached to disk.
Again, your mileage may vary - but this may help as a starting point.
2. For strategies to speed up the full load phase of the task, you might look into using parallel load, details here: Parallel Load | Qlik Replicate Help
Also - Task Settings -> Full Load -> Full Load Tuning
Maximum number of tables: 5
Transaction consistency timeout (seconds): 30
Commit rate during full load: 20000 (If there are no LOB columns - LOBs increase the amount of RAM involved and if you have LOBs this should be evaluated on a case by case basis)
3. Lastly, for increased stability of connection to Snowflake, on the advanced tab of the target endpoint add:
Additional ODBC connection properties:
CLIENT_SESSION_KEEP_ALIVE=true;ABORT_DETACHED_QUERY=true;CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY=900;
Hope this helps get you going!
Dana
Did the product acquisition not come with a certain number of training/consulting days? Use them or loose them!
Seriously, get management to approve a day or two of consultancy to get you started off on the right track.
Hein.
Hello team,
Yes, when you come up with product acquisition it comes up with a certain number of training/consulting days? Use them or lose them!
For Exact info you need to get in touch with AM or Sales. However, you are already a customer and used your PS hours previously they you may need to purchase additional PS hours for new solution or Setup
Regards,
Sushil Kumar
Thank You @Alan_Wang for answering to all of my questions.
Reading your answers gives me a better understanding of Replicate Product.
I'll keep learning from community & by doing practical work.
Regards,
Eric
Thank You for adding more light & details to @Alan_Wang's reply.
It was really helpful for a Replicate newbie.
Regards,
Eric
Hi @Alan_Wang @Dana_Baldwin @SushilKumar
Is there any option for Incremental load after doing the Full load. Maybe from some Script or batch file?
Can we add, delete tables from outside script rather than using Replicate where
we manually add, delete the tables which we want to migrate?
Regards,
Eric
Hello Team,
The option you are looking for are available under the Qlik replicate task settings.
like ( Full Load , Full Load CDC ) .
if you add any table and delete then outside the replicate them. Replicate is not aware with the action ( table addition and deletion so it leads to in consist data )
Replicate also have json file format where you can add tables and import and export the table to reduce manual effort.
however it is always recommended to user replicate console for Task or Endpoint modification.
hope this helps.
Reagrds,
Sushil Kumar
>> I'll keep learning from community & by doing practical work.
That a good attitude. Admirable. Using the community is great, and just doing the practical work will help a lot. Be sure to mostly do quick small tests (just a few simple tables or just 1, a few columns, several datatypes, a couple of rows, to get comfortable. And when you go big dare to kill, tweak, and start again. Once a few thousand rows are loading, nothing much will change. Build one, or several, to throw away... because you will anyway.
Still, if there is an option to work with trainer or consultant from Qlik Professional Services team or otherwise, you should also use that.
>>> Can we add, delete tables from outside script rather than using Replicate where
Yes, using the JSON. Export - Tweak - Import. Use macros in a a smart editor, or use a scripting language you are comfortable with to 'merge' your processed table list with the JSON for the task as entries under "explicit_included_tables": [{.
You could use SQL, but you are probably better of using Python, Powershell, Perl or Awk. I like Perl because it works on Windows as well as Linux.
I'll attach a 'superscript ' that can modify the JSON in Perl and Powershell, and put a bit of Perl below to show how to massage a CSV owner,tablename into a Replicate style include list.
@SushilKumar >> however it is always recommended to user replicate console for Task or Endpoint modification.
I disagree. If you know what you need, and it is more then a few dozen table, you should script it to get a reliable, predictable result. And you possibly want to add script to compare the resulting list with the requested list.
Cheers, Hein.
use strict;
use warnings;
my ($owner, $name, $table, $tables, $i, %tables) = ("", "", "", 0);
print "// Processed by <$0> on ", scalar localtime, "\n";
while (<>) { # read all
chop;
die "Too many words on line : $_" if /,\s/;
($owner,$name) = split /\./;
if (! defined($name) ) {
$name = $owner;
$owner = $DEFAULT_OWNER;
}
$tables++;
$tables{uc($owner.'.'.$name)} = $owner.'.'.$name;
}
for (sort keys %tables) {
($owner,$name) = split /\./, $tables{$_};
print qq(\t\t\t\t\t\t\t\t"id":\t0,\n\t\t\t\t\t\t\t\t"owner":\t"$owner",\n),
qq(\t\t\t\t\t\t\t\t"name":\t"$name",\n\t\t\t\t\t\t\t\t"estimated_size":\t0,\n),
qq(\t\t\t\t\t\t\t\t"orig_db_id":\t0\n\t\t\t\t\t\t\t}, {\n);
}
print STDERR "\n\n$tables tables processed.\n";