hi all,
I am trying to see if talend can help me move data from oracle to paraccel (postgres).
Say I have 1000 tables.
So far as I can see it, I can create a "process" to move each table. This would be something like:
tOracleInput, tFileOutputDelimited and tParAccelBulkExec
Works great! HOWEVER...
to set this up for 1000 tables, even if need to modify table name - looks ugly.
Question to you Talend gurus: is there a way to have a list of tables and then for each table perform the above process(extract, save file, bulk load)?
Thanks for your help/
let me know if to answer this question, you need more info (system, etc)
thankS!
Hi,
because TOS is mainly metadata driven this would not be possible. The only solution I see would be to generate the SELECT statement on the fly to get one large string which you could save in the next state. But I think this would be to complex (if it is possible at all).
I don't know both databases in detail. Could you make a unload of oracle and modify in any way it to load it back in postgres?
Bye
Volker
Yes, I could do it. For initial bulk load, it as simple as have a loop on the list of table names, for each of them extract DDL, transform it to Postgres format, extract data, execute DDL on the target, copy data to target. All is a one loop with 4 functions more or less. Instead of doing 1000 different processes in Talend. If it is really not possible, that I would not consider Talend for this purpose. However, later on when the ETL runs every day and we would need to incrementally load/delete data, this becomes more complex with a script, Talend could be of help (if that loop in the beginning could be possible). Is there a relatively easy way to script you own object that would perform an operation on a set of tables?
It is. You can make a system call to execute your script, or embed the code into Talend. (using the Code section of your metadata or a tJavaRow or tJavaFlex (or tPerlRow, tPerlFlex if you're using a real man's scripting language
)
So why use Talend at all? Where is the advantage between Talend and home grown solution? In the end you will end up writing your own code. Why not use "real man's scripting language" to do the whole load manually? It's a 1-page script (at least for initial load)...
The major advantage of Talend over a homegrown ETL solution is the repository, and that the majority of ETL tasks can be completed in Talend without the need for custom code. When you encounter a problem that "pure" Talend cannot solve efficiently, you have the ability to implement custom solutions.
Talend gives you a tool and a framework to organize, standardize, and monitor your ETL project. For the specific job you've described, using a "pure" Talend solution will probably not be efficent, but as we all know ETL often grows to more than just one job, and this is where using Talend shows its value.
Thank you Volker Brehm and JohnGarrettMartin for making clear why an ETL such as Talend is useful in every day data integration processes. I want to make clear that Volker and John are not Talend employees, so there is no "marketing" in their posts 😉 Personnaly, even if I know Talend Open Studio quite well (let's even say "very well") I don't replace each of my manually written script with a TOS job. If there is no file processing, database connection, sorting, deduplicating and other actions TOS does very well, I also like my Perl scripts 🙂 But obviously, when I need to process data, a TOS job is the best solution for me.
- 1st example, today I've updated my script to backup Subversion repositories with an incremental method (
see first revision of this script), this is script would be painful to write as a TOS job, because it processes no data.
- 2nd example, the week before, I've designed a job to retrieve Apache logs from our web server, retrieve an updated list of search engine robots IP addresses, read new Apache logs, filter on viewtopic.php page, extract IP address, add the country location, fill our forum_views {year, month, day, IP, country} stats table (I will soon distribute this job), a TOS job was just great for doing this and has saved me a lot of time.
The problem I see with your need is that each of your 1,000 tables has a different schema, I mean a different list of columns. In Talend Open Studio, the schema is set at design time (because of the repository and anyway it was a design choice for us to make component configuration much simpler). So what? We have to create a method to generate jobs (not generate scripts, we already do that) from a template. This is in our todo list, and is has been discussed several times in the Talend R&D team. I just want to make you know that we are aware of this issue and we have solutions in mind.
kirylm,
I think that we provide into the Studio some components to industrialize your need.
For example the components tOracleTableList; it gives you opportunities to Iterate on each tables you want copy. For each one you have several ways to fast export data to delimited file; then BulkLoad data into your ParAccel (as your described scenario but more GENERIC).
Or another way; could be for each Tables to call Oracle API to DUMP the data (tSystem or tssh); then bulkLoad DUMP into ParAccel (probably more complex).
I like the way to use Talend to design my Data Integration need, to understand and explain to people the data migration processus; and to maintain easily this graphical approach anyway.
Best regards;