Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a task with 64 HANA db tables as a source and SQL Server 2019 as a target with the following Full load tuning settings:
- Maximum number of tables: 5
- Commit rate during full load: 26K
One of the tables with 91.5 million records had a throughput of around 2K rec/sec and another table with more than 2 billion records had a throughput of around 4K rec/sec . Is there a reason for other smaller tables to have better throughput raging between 20K rec/sec and 75K rec/sec ?
Hmm. 2 billion records had a throughput of around 4K rec/sec. That maps to 5 almost 6 days.
>> Is there a reason for other smaller tables to have better throughput
Most likely it is the 'width' of the tables defining the speed. Both the number of columns and and number of bytes matter. With the bytes include the average size for the LOBs (VARCHAR(MAX))
Have you tried a much higher commit rate? 100K, 300K? Just use a tet test with your worst table. For each of 10K, 30K, 100K, 300K Start, give it a minute or 5 to stabilize, watch the records/sec and kill.
Can use use Replicate server (Linux? Windows) tools to see the resource consumption for the load/unload threads? Are they the bottleneck using close to a single CPU perhaps?
Give those slow tables higher priority to get them started early. If you are stuck with multiple days for one major table (RESB?) then you might as well drop DOWN the number of streams to 2 or 3 to reduce the contention for resources as the other tables will surely easily finish while the slowest one is loading.
Can you use parallel load with data ranges for the large table? In that case increase the number of streams accordingly.
Finally be sure to check with the source and target DBA's and possibly network folks to determine the most likely bottleneck. I suspect the source, but who knows. Look for top-SQL resource attributes. Look for actual gigabits/sec on the network vs network connection ratings.
Good luck,
Hein
Hmm. 2 billion records had a throughput of around 4K rec/sec. That maps to 5 almost 6 days.
>> Is there a reason for other smaller tables to have better throughput
Most likely it is the 'width' of the tables defining the speed. Both the number of columns and and number of bytes matter. With the bytes include the average size for the LOBs (VARCHAR(MAX))
Have you tried a much higher commit rate? 100K, 300K? Just use a tet test with your worst table. For each of 10K, 30K, 100K, 300K Start, give it a minute or 5 to stabilize, watch the records/sec and kill.
Can use use Replicate server (Linux? Windows) tools to see the resource consumption for the load/unload threads? Are they the bottleneck using close to a single CPU perhaps?
Give those slow tables higher priority to get them started early. If you are stuck with multiple days for one major table (RESB?) then you might as well drop DOWN the number of streams to 2 or 3 to reduce the contention for resources as the other tables will surely easily finish while the slowest one is loading.
Can you use parallel load with data ranges for the large table? In that case increase the number of streams accordingly.
Finally be sure to check with the source and target DBA's and possibly network folks to determine the most likely bottleneck. I suspect the source, but who knows. Look for top-SQL resource attributes. Look for actual gigabits/sec on the network vs network connection ratings.
Good luck,
Hein
Hein,
This HANA table has 6 columns and no LOB's. I did a couple of test as you suggested with 50K having an average of 90K rec/sec and with 100K an average of 130-150K rec/sec. These are more favorable numbers.
This table uses Parallel loading, but I realized that having the "Maximum number of tables" set to 5 under "Full Load" > "Tuning Settings" might have been one of the causes for the slow load. I changed this to 10, the commit rate to 50K. Also I've distributed better the parallel load segments and added some filters to remove unwanted records from the load.
Thanks for your help.
It might be worth turning up the "Performance" logging on the task to see if you can determine if the performance lag is the source, handling or the target