Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I'm researching about techniques to prepare a staging area for a data warehouse. I am interested in using Talend Open Studio to load data from a production environment to a staging area.
In your opinion what are the best techniques to achieve this purpose? And how to increase performance?
Thanks
Hi!
nothing unresolved, but:
would be good provide little more information?
Or You will have very, very theoretical recommendations (same as question)
"Answer":
1. Staging database or warehouse not different from any other - production, development and etc
2. Use Stream for database components which support it (or cursors in other terms)
3. make huge transformation in best place - on source, on talend, on stage (target) database
4. Depending from Your database, use fasted method for loading data
Think it is not very help You
Questions:
0. What is Your current (or expected) troubles?
1. What expected size of data?
2. What architecture? (local cloud, in-house, mixed, oversea clouds)
3. What databases - source and target?
4. what transformations? (lookups and etc)
....
Thanks for the answear. My question is generic because I'm searching for create a "talend job prototype" to use in several scenariuos. I have different source DBMSs from which I'm to load data and create different staging area.
Some more pratical questions are:
There exists some Talend component to manage different DBMS source?
There exists a way to dinamically map tables structure?
There exists some component or technique to speed up the data copy?
I would like to understand if such research makes sense or it is better to use specific techniques for each dbms.
There exists some Talend component to manage different DBMS source?
- no, more or less universal tJDBC* - but it not mean it could work dynamically with different databases
There exists a way to dinamically map tables structure?
+-
more "-" rather "+"
- subscription version support dynamic columns
- few components designed by community member for copy tables
but it not allow for You mapping (tMap), kill transformations inside Talend
There exists some component or technique to speed up the data copy?
it strictly depend from target database
most popular bulk technics for most popular databases supported - check all components which contain BulkExec in it name
I like universal solutions as well, but very often - design this solution take much more time, than design 10-20 separate jobs.
"Some people write Java ... some people write code with Java - and it is different people", same for universal components - if You design any of universal components, community would be thankful for You.