how to design component that adds data based on main flow
Hallo,
I've just written my first component, and now that it works, I want to learn how to do it right.
My component connects via RMI to a proprietary server interface. It is not meant for generating the main data flow but for getting some additional data on the side. The additional data is subsequently used to decide how to process the main data. My solution for now is to write the additional data to the globalMap under defined keys and tell my job developers to extract it from there. This requires the job developers to have some basic understanding of Java coding. I would prefer to do it in a more intuitive, graphical way, but couldn't find a solution that worked.
I have looked into what tMap lookups do, but I need to access the main flow data for my RMI calls. As far as I could see none of the tMap lookup options provides me access to the main flow. Is there a way to access the main data flow from a component that is connected to a tMap as a lookup? Or any other component supporting lookups?
The other approach I looked into was to add columns to the main flow. Is this possible? I think it should be somehow, but couldn't find an example, and my own guesswork implementation didn't come to anything. Besides, it feels wrong to add columns in a component, because the number of columns is usually assigned by the job developer. Is there an example of a component that adds columns on its own?
Thanks for anybody who helps me along, or just tells me to forget about it,
Greetings,
Florian.
Hi Florian,
Could you explain in some more detail what it is you want to lookup in the RMI call return and/or what you need frmo the main flow in this lookup, because it seems to me that this should be possible, I just don't really get what it is you want.
Regards,
Arno
Hi Arno,
thanks for trying to understand. I have different use cases. All should be handled by the same Talend component.
Typical use cases looks like this:
The Talend job reads an input file line by line from the file system, breaks each line up into different columns, and writes the result to an Oracle database.
The main flow are daily updates about financial instruments. My new component should be able to call another application by RMI, and look up the value of a field "CodeConversion" for this instrument.
If the field "CodeConversion" contains something like "YesPleaseDoConvert" my new component will have to do antother RMI call. This time it should look up the internal code "75a" that corresponds to the external code "42042" coming from the input file.
A third case would be to look up if there is a transaction pending for this instrument, because the Talend job has to produce an additional output file (=start a subjob) in this case.
Maybe the unusual thing here is that I want to do everything with the same component. The reason behind this is that the Talend job will eventually run on a customer system. I have only limited access to this system, and I want the flexibility to add new functions without having to install new components.
Does this help to understand the intended scenario?
As far as I see, with the tMap lookup I could only execute the same RMI call each time, not provide the instrument's ID or anything else from the main flow as a call parameter.
In the meantime, I have found "tAddLocationFromIP". This component indeed adds a column to the main flow. I have copied this approach, and multiplied it to 15 additional columns. Also, I have been pointed to tOracleConnection which does something similar in error cases.
So I have a working solution now, and a limited set of precedents. Still it feels iffy. I would very much appreciate the opinion of a more experienced component designer.
Greetings,
Florian
Hi Florian,
I think I'm getting the point a little bit more. Still I'm not sure if I understand why you want all this functionality inside one component.
Updating functionality in one component takes a fresh install of the job, just as much as updating functionality in a job it self.
Besides that, I think you should start by dividing the problem into several smaller pieces and only after completing the job try to wrap things together and "package" things into single joblets or components. Using joblets and/or subjobs makes it possible to easily re-use certain functionality.
Especially because all functionality you describe sound like pretty easy to do when using standard Talend components (and some customs for you external systems).
Best regards,
Arno
Hi Arno,
now I see what was I missed to mention.
> all functionality you describe sound like pretty easy to do when using standard Talend components
No, it's not easy.
1. All those fields, codes, transactions to look up exist only in the working memory of the other application, and the only available interface are the RMI calls.
2. I re-use the API design and documentation of an already existing, proprietary scripting language.
3. I am not the one who writes the Talend jobs. The Talend jobs will be written by experts who know the format and the business context of the incoming or outgoing data. Some of them know the proprietary scripting language.
4. The component will be used in many different ways. I do not control how the jobs will be organized, which Talend environment is used etc.
> Updating functionality in one component takes a fresh install of the job, just as much as updating functionality in a job it self.
On the server side, the application I connect to has its own development and update cycles. On the client side, the Talend job developer has his own development and update cycle. My component which provides the RMI connection is designed to de-couple those competing cycles. I do not want to update the component every time a Talend job developer needs new functionality.
I'm not sure where this explanation leads to, but it should help to understand why I chose a one-size-fits-all approach. That said, things would become simpler, in a way, if I split off some specialized components, but the basic question of how to integrate my component(s) into the main flow stays the same.
Greetings,
Florian
Hello Florian, the task is still a bit "fuzzy" for me, so I am not 100% sure I understood it completely. However I understood you may have two main steps : 1) The component getting the data 2) An optional decode which could bedone in tMap, but unfortunately the logic behind may be too complex and you need to rely on external proprietary APIs For point 2, one option you may have is to use routines. I still fail to completely get why you cannot decode in the component itself, but I guess there must be a good reason, but a decode functionality is a typical task for a static method which eventually can still use an external API
Sigh. I can't get it explained properly.
I'll give it another try: The job will import some data, and use my component to get some supplementary information. The input for my tSuppleInfo comes from the main flow record. Where should the output go?
--
Why I prefer a component over routines.
Thousands of different jobs will use my component to get this supplementary information. Well, not thousands, but perhaps dozens. The jobs will not be written by me, but by people with different backgrounds. Some are programmers, others are bankers, most are somewhere in between. I have to keep it as simple as possible. They should drag the component into the job, fill out some parameters, and connect the arrows. You can't drag and connect routines with the mouse. Of course I may try to tell the job developers to jump through hoops, but then they may prefer not to, walk around the hoops, and not use the component.
--
Why I need to make RMI calls.
No, neither the optional decode nor any of the other functions can be done in tMap, or in my component itself. It can only be done in the external application because only the external application has the data. Data based on a customer-driven, historically aware, context-sensitive rule system evaluated on the spot.
--
My question is not whether I should use a component, or whether I need to make RMI calls, but how I should design the component. What's the best way to integrate it into the job? Which connectors to use? Are there good examples to follow? The input parameters are easy - the component parameters/settings are intuitive and ready to use. But the output of a component like mine seems a bit tricky. (I think that's because Talend doesn't want to fool around with side flows that magically re-integrate into the main flow.) I'll go with the additional columns, for version 1.0, but still have some doubts. Maybe somewhere out there somebody can understand me, has had a similar problem, found an astonishingly elegant, mind-blowingly simple solution, and would like to share it. It might be possible with a tMap and a neat trick.
Yeah, really, either your are telling the story way more complex than it is, either you are not giving us the full picture
Anyhow, I understand your basic need is to decode something, does nto matter if this decoding is done internally in Talend or if you fetch the result externally :
you basically have 1 field in input (plus eventually other fields) that are not used in the process, and 1+1 field in output (the added field is the decoded one).
For some reason you love components, good, that's fine for me too.
In that case you need :
a schema input (normally inherited from the preceding component, via the data connection that links it to your component) and a schema output which in most cases will be equal to the incoming schema PLUS the decoded field.
In your parameters you will need all the info needed to connect to the RMI service and you will use them most likely in the _begin section fo your component, where you will init the RMI connection.
Two additional parameters will be 1) the input field , 2) the output field
Your component will probably be set to DATA_AUTO_PROPAGATE which will allow you to avoid iterating all the metadata to copy over to the output connection.
In the _main section of your component you will read the input column, call the RMI, get the response and populate the output column.
in the _end section you wil clean up whatever is needed with your RMI or other stuff that might need de-init attention.
Assuming that for each record in input you have ALWAYS one and only one record in the output, than that's pretty much it.
Hope it helps,
Francesco
Hallo Francesco,
ok, pictures. I can do that. Eh, I can give it a try. Not sure about this Image Upload Slots. Preview doesn't show anything.
The real jobs are way more complex, of course, but to illustrate my points I've simplified a job to something basic like this:
1. Read input file
2. Do something
3. Write output file
The job is part of an order process, and "2. Do something" involves checking the limit for individual orders to avoid costly consequences in case of typos, program errors, and rogue traders.
Now I don't want to have the same limit for all types of orders, but a customized limit our users can set for customized classes of instruments. The customizing is done in the external application, and tSuppleInfo is the interface to connect to the external application. It calls a function called GET_LIMIT over RMI, provides the ident and timestamp to identify the instrument, and gets the limit. Conceptually this is supplementary info I look up, I would like to do it as a lookup in Talend:
But this doesn't work. My lookup source needs to know the ident and timestamp, but it can't access the main flow. (Without forcing every job developer to provide something via globalMap or similar.)
The next best solution is to add columns to the main flow.
It works. It is clumsy. I can have more than one output value, and afaik I have to provide columns for all potential at design time. Every job developer has to wonder why, and live with or get rid of the unwanted, technically named extra columns. The job developer might have more than one tSuppleInfo. What then? But it's the best I've come up with.
Greetings,
Florian