Hi,
I'm using tOracleTableTransfer for loading ~90 big tables, and noticed different speed for 2 tables with similar row counts, and found out the one took longer was due to the big table size!
I tried to play with the "Log interval in seconds", "Source select fetch size" and "Insert Batch size" parameter numbers and hope for the best results but it still taking a very long time for my tables to get loaded! (5 hours for about 90 tables with average row counts of 50K/table)
Does anyone have more depth knowledge on how this perfect custom-made component works?
I have even divided my table list into 2 for relatively large and small tables and gave 50,000 to small ones and 500,000 to large ones for "Source select fetch size" and "Insert Batch size" parameters, but it didn't help with the speed.
Any other idea on how to speed the load up and enhance the performance?
Any help is appreciated!
Hi,
tOracleTableTransfer is a custom component written by talend community user and shared on talend exchange portal.
Feel free to contact the author jlolling directly please.
Best regards
Sabrina
This component is a bit outdated now. I have worked a lot on the pedants for MySQL and PostgreSQL. I will update the Oracle part now very soon and keep you updated.
I have updated the component tOracleTableTransfer now.
Because of Talend Exchange is extremely slow in approving updates I have provided this component on my personal web space:
http://jan-lolling.de/talend/components/tOracleTableTransfer.zip Could you please check this version.
One word to large tables. It could be the transfer slows down if the tables grows. This is a effect caused by the database itself. Check if you have some indices. They will be updated too and can cause serious performance issues while writing the data.
Thank you so much for quick response!
I'm running my job right now with the updated version of the component. However I do have some questions regarding the component:
1. Now I see the following log are being generated by the component, I was wondering if there is a way to avoid those for some jobs?
2. This component does commit the data even if I don't put commit at the end. So I can't have a "all or none" if I'm loading multiple tables within 1 job?
3. Is it correct to assume the larger table, the larger fetch size then higher performance and smaller table, smaller number for fetch size?
4. Does "Log interval seconds" have any impact on the performance? I've tried 0, 1 and 5, and 0 gave me a better performance, but not sure if there is any correlation!
5. I used the new version of this component and ran the same amount of data and number of tables, with the old version it took me 10min, with the new one took 19min, is it something that I'm not setting up correctly that I observe this difference?
My job logic is to have 1 tOracleTableTransfer component, and I loop through my table list, and plug them into the component to load the data from source to target. Is there something wrong with doing that in terms of the speed. Of course my tables are not the same in the size, but all of them are over 40K records. Also the data in my source are growing but I have where clause that only grabs the data that are not changing (not getting any updates), and my target is not changing, so basically I'm just loading data into a repository kind of database once a day. My source does have PK and FK and indices but my target has primary keys with no foreign keys.
Thank you again!
OK, The sped actually should not become slower. I am surprised about this happens in your job.
1. With the Setting log-interval in seconds = 0 you can switch off the logging.
I will change the logging to a log4j logging. In the past Talend had no support for log4j, now this is the default
2. Yes the component commits the data (after every batch). I can add an option to switch off the commit and let you do this.
3. The fetch size and the batch size is highly depending on the database and the table structure. The fetch size works in the same way the the tOracleInput and here the Cursor size (which is actually the fetch size) and the batch size is exactly the same as the batch size in the normal tOracleOutput components. You should play with the parameters and find out the best values. If you set the batch size to high you will fill to much space in the Oracle SGA and this can lead to a reduced performance. I can tell you, much more then 10.000 is actually useful only in seldom situations.
4. To write to the standard output can cause performance issues. To switch off this be setting the log interval to 0 is a good choice
5. In my projects the update had no negative impact to the performance. Take care every run of a table transfer will use a new database connection (never use a connection for parallel runs!).
You can usually drastical increase the performance if you disable the primary key before.
ALTER TABLE table_name DISABLE CONSTRAINT constraint_name;
One hint: this new release can use not only tOracleConnection as source. You can also use PostgreSQL or any other database as source to feed your Oracle database this way.
Again thank you so much jlolling!
I have set the log interval in second to 0 but I'm still getting log output in my Talend output console!
Adding the optional commit would be an awesome option to have!
Regarding the batch size, you mentioned more than 10.000 is very seldom, but the default there is 50.000, is this correct? So it's better to change it to less 10K for better performance in y job?
Not being limited to oracle in terms of the datasource it's a big break through for our projects. We will incorporated soon.
Thank you sooooo much!
With the log output, I will change all of them to the DEBUG level, so you have the INFOI level free for any business informations.
Regarding the batch size: Please keep in mind, it is about increasing the network speed because of the better efficient TCP packages and it is a matter of the database of using a kind of bulk load facility.
If you want to find out the maximum (to much batch site can lead into a decrease of the performance) check it out, try it for you. Let it run over a time and check the overall performance.
The usage of other database types can sometimes a bit difficult because of different data types. Because of that, I have added the option to map database types to Java types. The java type will be used get the data from the source and to send the data to the target.
But please keep in mind, a transaction should not be that big, there are mostly limitations about the size of a transaction. Check that with your database administrator.
It will take a bit - I guess until Monday to change the component. Stay tuned.
I have finished my work on the tOracleTableTransfer component. If you want, you can try it:
http://jan-lolling.de/talend/components/tOracleTableTransfer.zip In the advanced settings you will find the option "User a separate tOracleCommit component..." and this will prevent the component from committing. Please keep in mind to use this option carefully because usually transactions have a maximum size.
Cheers
Jan