Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
We are researching whether Talend Open studio tool can be used for our upcoming project which has the workflow similar to:
1. Getting metadata from db for each active program
2. For each active program,we need to check whether the program is scheduled ,
3.If yes then we should get the program module and run
4. then the loop continues back to checking @ point 2
Can anyone suggest if this can be achieved by Talend open studio for integration or any other input which helps us to know feasible solution.
I would never recommend software like talend, informatica, sas, pentaho, etc, etc. to build something like this from an architectural point of view. However technically and functionally its possible. But this so called "generic" stuff is determined by software limitations and there own view+implementation on generic processes but it still needs to generate this java code ...
My 2 cents on this topic:
When "generic" enters the battlefield of data-engineering, things go into another abstraction level.
The whole abstraction process you want to create/define should be platform independent, in some sort of template which contain business rules, encoding types, data information, validity checks... or even ai / ml stuff.
Maybe even how you want to process and store, you want columns and or rows as vectors, unique key validation, hashing, encryption, privacy... contracts, etc... What about exceptions, logging monitoring?!? Maybe different SLA agremeents?
Here's a basic example of which i use in Talend and Python for my job configs and every part is accessible via NodeJS / api.
I think the 'beauty' (yes still a lot of room for improvement) of this is that you extract your business rules into configuration which are also accessible by other domains in your architecture:
"jobs" : [ {"job" : { "name" : "MyEmailWeb", "DB_Schema" : "something", "hdfs_dir" : "/etl/MyEmailWeb", "process" : true, "description" : "E-mail Service", "start_date": "2015-01-01 00:00:00", "eprivacy" : true,
"create_library" : true, "data_items" : [ { "campaigns" : {"process" : true , "table" : "campaigns"} }, { "groups" : {"process" : true , "table" : "groups"} }, { "mailings" : {"process" : true, "table" : "mailings" , "vectorize": ["clicked", "time"] } }, { "bounces" : {"process" : true , "eprivacy" : true, "retention_days" : 730 , "table" : "bounces", "mask_columns" : "contactID"} }, { "contacts" : {"process" : false , "eprivacy" : true, "table" : "contacts" } } ],
"data_quality" : [],
"analytics" : [] }}, {"job" : { "name" : "KissTheFrog", ............... you get it }}
This would be building Integration platform to load 3rd party data to our database. Programs are kind of top level information within the Integration services.Yes,Metadata as in column name and field type. Currently system is handling vendor based adhoc integration via windows workflow foundation. We are looking for more generic solution through talend wherein we check certain conditions as per business requirements and create a process flow for Integration to happen.
The main challenge is we have to dynamically generate and populate our staging table ie we have to include certain standard columns and dynamically fetch columns from vendor files for creating staging table. Is this possible in Talend Open studio free version?
I would never recommend software like talend, informatica, sas, pentaho, etc, etc. to build something like this from an architectural point of view. However technically and functionally its possible. But this so called "generic" stuff is determined by software limitations and there own view+implementation on generic processes but it still needs to generate this java code ...
My 2 cents on this topic:
When "generic" enters the battlefield of data-engineering, things go into another abstraction level.
The whole abstraction process you want to create/define should be platform independent, in some sort of template which contain business rules, encoding types, data information, validity checks... or even ai / ml stuff.
Maybe even how you want to process and store, you want columns and or rows as vectors, unique key validation, hashing, encryption, privacy... contracts, etc... What about exceptions, logging monitoring?!? Maybe different SLA agremeents?
Here's a basic example of which i use in Talend and Python for my job configs and every part is accessible via NodeJS / api.
I think the 'beauty' (yes still a lot of room for improvement) of this is that you extract your business rules into configuration which are also accessible by other domains in your architecture:
"jobs" : [ {"job" : { "name" : "MyEmailWeb", "DB_Schema" : "something", "hdfs_dir" : "/etl/MyEmailWeb", "process" : true, "description" : "E-mail Service", "start_date": "2015-01-01 00:00:00", "eprivacy" : true,
"create_library" : true, "data_items" : [ { "campaigns" : {"process" : true , "table" : "campaigns"} }, { "groups" : {"process" : true , "table" : "groups"} }, { "mailings" : {"process" : true, "table" : "mailings" , "vectorize": ["clicked", "time"] } }, { "bounces" : {"process" : true , "eprivacy" : true, "retention_days" : 730 , "table" : "bounces", "mask_columns" : "contactID"} }, { "contacts" : {"process" : false , "eprivacy" : true, "table" : "contacts" } } ],
"data_quality" : [],
"analytics" : [] }}, {"job" : { "name" : "KissTheFrog", ............... you get it }}