Skip to main content
Announcements
UPGRADE ADVISORY for Qlik Replicate 2024.5: Read More
cancel
Showing results for 
Search instead for 
Did you mean: 
Al_gar
Contributor III
Contributor III

Slow full load

I have a task with 64 HANA db tables as a source and SQL Server 2019 as a target with the following Full load tuning settings:
- Maximum number of tables: 5
- Commit rate during full load: 26K

One of the tables with 91.5 million records had a throughput of around 2K rec/sec and another table with more than 2 billion records had a throughput of around 4K rec/sec . Is there a reason for other smaller tables to have better throughput raging between 20K rec/sec  and 75K rec/sec ?

Labels (1)
1 Solution

Accepted Solutions
Heinvandenheuvel
Specialist III
Specialist III

Hmm.  2 billion records had a throughput of around 4K rec/sec. That maps to 5 almost 6 days.

>>  Is there a reason for other smaller tables to have better throughput 

Most likely it is the 'width' of the tables  defining the speed. Both the number of columns and and number of bytes matter. With the bytes include the average size for the LOBs (VARCHAR(MAX))

Have you tried a much higher commit rate? 100K, 300K? Just use a tet test with your worst table.  For each of 10K, 30K, 100K, 300K Start, give it a minute or 5 to stabilize, watch the records/sec and kill.

Can use use Replicate server (Linux? Windows) tools to see the resource consumption for the load/unload threads? Are they the bottleneck using close to a single CPU perhaps?

Give those slow tables higher priority to get them started early. If you are stuck with multiple days for one major  table (RESB?) then you might as well  drop DOWN the number of streams to 2 or 3 to reduce the contention for resources as the other tables will surely easily finish while the slowest one is loading.

Can you use parallel load with data ranges for the large table? In that case increase the number of streams accordingly.

Finally be sure to check with the source and target DBA's  and possibly network folks to determine the most likely bottleneck. I suspect the source, but who knows. Look for top-SQL resource attributes. Look for actual gigabits/sec on the network vs network connection ratings.

Good luck,

Hein

 

 

View solution in original post

3 Replies
Heinvandenheuvel
Specialist III
Specialist III

Hmm.  2 billion records had a throughput of around 4K rec/sec. That maps to 5 almost 6 days.

>>  Is there a reason for other smaller tables to have better throughput 

Most likely it is the 'width' of the tables  defining the speed. Both the number of columns and and number of bytes matter. With the bytes include the average size for the LOBs (VARCHAR(MAX))

Have you tried a much higher commit rate? 100K, 300K? Just use a tet test with your worst table.  For each of 10K, 30K, 100K, 300K Start, give it a minute or 5 to stabilize, watch the records/sec and kill.

Can use use Replicate server (Linux? Windows) tools to see the resource consumption for the load/unload threads? Are they the bottleneck using close to a single CPU perhaps?

Give those slow tables higher priority to get them started early. If you are stuck with multiple days for one major  table (RESB?) then you might as well  drop DOWN the number of streams to 2 or 3 to reduce the contention for resources as the other tables will surely easily finish while the slowest one is loading.

Can you use parallel load with data ranges for the large table? In that case increase the number of streams accordingly.

Finally be sure to check with the source and target DBA's  and possibly network folks to determine the most likely bottleneck. I suspect the source, but who knows. Look for top-SQL resource attributes. Look for actual gigabits/sec on the network vs network connection ratings.

Good luck,

Hein

 

 

Al_gar
Contributor III
Contributor III
Author

Hein,

This HANA table has 6 columns and no LOB's. I did a couple of test as you suggested with 50K having an average of 90K rec/sec and with 100K an average of 130-150K rec/sec. These are more favorable numbers.

This table uses Parallel loading, but I realized that having the "Maximum number of tables" set to 5 under "Full Load" > "Tuning Settings" might have been one of the causes for the slow load. I changed this to 10, the commit rate to 50K. Also I've distributed better the parallel load segments and added some filters to remove unwanted records from the load.

Thanks for your help.

Jon_Donker
Creator
Creator

It might be worth turning up the "Performance" logging on the task to see if you can determine if the performance lag is the source, handling or the target