[resolved] Best practice to build a joc chain and to deploy changes
Hi,
i´ve read through the documentation and loads of forum Posts now, but i can`t get a real opinion on the following:
What is the best/recommended way to build a job chain?
My idea was to just put all my jobs in one big ""masterJob" and run this from the Job conductor. Is that the preferred method, or should i connect the single Jobs in an execution plan? it the latter is the case, how can i save the execution plan outside of TAC? The TAC is in control of another department, and i´m not so sure about it´s "stability" 🙂 Second question, especially regarding the answer to the first one: If i do what i described above (masterJob - approach): If i change one of my "subjobs", in my opinion i would have to re-deploy the "masterJob" also to get the changes to production? How do you move your developments to production - via "pre-compiled" zip-file or via different SVN-Tags/Branches? If via zip-file - what would you do if there are parallel developments in different subjobs and you want to get only one of them into produciton? The masterJob-Export would contain bith of them in this case?
Thank you very much for your help and best regards!
Markus
I would generally suggest to couple the jobs as loosely as possible -> Execution Plan would fit to that approach.
If you add all in a main job you use only on JVM and this could have impact to the necessary memory and you have only one big log file. The next thing is you have to deploy all jobs at once if you change only a bit.
Hi jlolling,
thank you very much for your reply!
Are there any disadvantages in using the execution plan in excess to the ones i have already found:
- no possibility to import/export/backup/restore the execution plans
- not possible to insert Jobs/Tasks in between existing Tasks
- what about context-variables - are they passed from one task to the next ( i wanted to have one context loader job in front of all other jobs)
?
Best regards
Markus
I know. The other possible solution is using the custom component tRunTask. This component works like tRunJob but calls instead a TAC task referenced by its label or its ID.
This gives you the advantage you can update parts of you system and you can rerun single task if needed.
https://www.talendforge.org/exchange/index.php?eid=1271&product=tos&action=view&nav=1,1,1 There is also a documentation. This component works very well in a couple of productive projects.
Hello again,
tRunTask looks good, only one Thing that i wonder about: is it possible to "hand over" the context from one job to the next one with this component? My plan was to read the context from a file at first and then start "push" the context through the whole chain...
Nevertheless, after some experimenting with the Talend components for scheduling, i will probably not use them at all and use a separate (and more sophisticated) scheduling-tool for this (UC4/Automic) - stil have to test if that works better 🙂 Thank you for your valuable replies!
Best regards
Markus
Hi Markus,
you mean the context from the called task to the job with the tRunTask and hand it over to the next job - because of changes in the context in the called task. No this is not possible because Talend does not provide the transfer of a context to the parent task within the TAC. Actually this will also not work if UC4 starts the jobs because the transfer of the context variables need currently the jobs are running in the same JVM instance.
If you want transfer information over jobs, build a dedicated table or use a file.
Hi,
Actually this will also not work if UC4 starts the Jobs --> I know, i plan to just include the "read context from file"-step in every Job :-)
What works really good in UC4 is the creation of Workflows (with parallelism), Triggers, Events, restart of jobs etc.
It also implies a pretty straight-forward deployment technique: Build Job--> copy zip to server --> include job in UC4 --> done :-)
I will also give the tRunTask-component a try and see if i could stick to "Talend only"-Tools!
The usage of UC4 or any other job controller like this has at the moment the advantage you can use the job return code to steer the job chain. Talend does not provides this feature. In the latest release 5.6.1 you can get this return code via the web service -> tRunTask but not in the execution plans. I have a lot of projects which works exactly in the way you described at first and it works very well.
Additional we add to the jobs components to register the job in a monitoring table and get for every job run:
* counters,
* all timestamps,
* context variables (at start and at the end)
* logs
* return code
* host
* host pid
* user running the job
* last processed timestamps- or values- min/max value
and use this information to steer incremental loads
https://www.talendforge.org/exchange/index.php?eid=1316&product=tos&action=view&nav=1,1,1 It is also possible to use the AMC database but mostly that does not fit to our needs.
I´ve now played around with tRunTask and it seems to do pretty much of what i need 🙂 - thanks for the suggestion an implementation of that valuable component!
One thing i couldn`t figure out: is it possible to have a chain of tRunTasks that "splits" to parallel execution at certain points and "unites" later on? Something like:
One resolution i could think of is to "encapsulate" the jobs i want to run in parallel into seperate subjobs, but is there an easier way?
--> Too fast with posting, found the solution: tParallelize 🙂
Your question is not specific to the tRunTask component. The second parallelisation is wrong and does not work! You can trigger from the first one with the synchronize output. This trigger fires only if all parallele "routes" have been finished. Additional you can decide what should happen if one route fails. tParallel_1 -- Parallel --> tRunTask_1 -- Parallel --> tRunTask_2 -- Synchronize --> tJava (or anything else what should happen when both task have been finished)