how to design component that adds data based on ma... - Page 2 - Qlik Community

Anonymous · ‎2012-05-04

Hallo,
I've just written my first component, and now that it works, I want to learn how to do it right.
My component connects via RMI to a proprietary server interface. It is not meant for generating the main data flow but for getting some additional data on the side. The additional data is subsequently used to decide how to process the main data. My solution for now is to write the additional data to the globalMap under defined keys and tell my job developers to extract it from there. This requires the job developers to have some basic understanding of Java coding. I would prefer to do it in a more intuitive, graphical way, but couldn't find a solution that worked.
I have looked into what tMap lookups do, but I need to access the main flow data for my RMI calls. As far as I could see none of the tMap lookup options provides me access to the main flow. Is there a way to access the main data flow from a component that is connected to a tMap as a lookup? Or any other component supporting lookups?
The other approach I looked into was to add columns to the main flow. Is this possible? I think it should be somehow, but couldn't find an example, and my own guesswork implementation didn't come to anything. Besides, it feels wrong to add columns in a component, because the number of columns is usually assigned by the job developer. Is there an example of a component that adds columns on its own?
Thanks for anybody who helps me along, or just tells me to forget about it,
Greetings,
Florian.

Anonymous · ‎2012-06-01

Ooook, starting to be a bit more clear now

Something still not totally clear - you are going to process a full batch of records, right, not a single record each time, correct?
I mean (I suppose) you will have a set of records, each one with ident and timestamp.
In that case they have to stay with your main flow until you calculate your limit (which you could oput at the beginning of the process so you don't need to carry around the other columns anymore after that, if you don't need them.
The issue with this approach is that you will have a RMI call per each record and performances might not be acceptable (that actually depends a lot on the performances of the application you are calling).
If your remote application can accept (and has a better performance with) batches, than you could split the process in two : 1) get the distinct values of ident and tstamp, send them to the remote app, get the result and cache it locally (globalMAp most likely) 2) process all the records and update the limit(s) 3) output all the records
There is an undocumented feature called "virtual components" that can help you with this, the issue is that you will need to store everything in memory, which might not be applicable if you can have lots of records.
Virtual components are basically two components tied together, where the first one (input) reads the data, does whatever it needs with it and finally it stores it ina global buffer
The second component (output) reads the buffer, does whatever it needs with it and spits it out in a data flow.
Tehcnically (to the user) they appear as a single component (I think tSort is one of them... there are a few standard TOS components that are "virtual").
To keep it simple and reduce the amount of allocated memory, you could use a standard ocmponent that performs a RMI call "on the fly" in the main section, meaning it has an input connection that contains the ident and the tstamp, two parameters will prompt the user to specify which column has the ident and which one has the tstamp (you could provide a default, enabled witha checkbox, which will allow you to automatically identify them using standard column names).
The you have your row input connection :
Ident_<%=cid %>= <%=InrowName%>.<%=IdentName%> ;
TStamp_<%=cid%>= <%=InrowName%>.<%=TstampName%> ;
limits_<%=dic %> = MyRMICallAndWhateverINeedWithItLikeLoopsAndStuff_<%=cid%>.doThings(Ident_<%=cid %>,TStamp_<%=cid %>);
<%=OutrowName%>.<%=limitsName%> = limits_<%=dic %>;
Obviously here you can also perform other actions like filtering the row or setting a flag to reject it if conditions are not met :
if (limit_<%=cid%><<%=InrowName%>.<%=PriceName%> *<%=InrowName%>.<%=QuantityName%>)
<%=OutrowName%>.<%=tagForRejectColumn%> = true;
// or <%=OutrowName%> = null; to filter out the row, in that case remember to reconstruct the object at the begin of the main section with :
<%=OutrowName%>= new <%=OutrowName%>Struct();
About having multiple columns for the limits :
You would need dynamic schema which are only available in TIS, in TOS you could use a string field and store a list (i.e. comma separated ) with all the values
Again you will need a parameter to identify the name of the column used to store the limits and also in this case you can provide a default value enabled via a checkbox (hide the column selection with a SHOW_IF in the XML descriptor).
In the _begin section of your component you will create an instance of the RMI interface class
MyRMICallAndWhateverINeedWithItLikeLoopsAndStuff_<%=cid%> = new myPackage.mySubPackage.MyRMICallAndWhateverINeedWithItLikeLoopsAndStuff(<connection pars I gues>);
and in the _end section you will close the connection, dispose objects etc.
MyRMICallAndWhateverINeedWithItLikeLoopsAndStuff_<%=cid%>.closeconnection();
Finally, if you can standardize the input and output schema, then you can set them "green" in your XML descriptor and avoid potential mistakes from the end user

Anonymous · ‎2012-07-28

Good evening

i'm following this thread, as i'm in a similar pond. The virtual component features seems really the solution for all my problems! but it's still quite difficult to understand how to build a simple virtual-paired-couple components, as Sort and Aggregate are definitively difficult to reverse.
@sabuto,
perhaps you could help us providing a bit more information of this nice undocumented feature?

best regards,
gabriele

Anonymous · ‎2012-07-30

Did a few experiments with virtual components, but did not really like the overall idea to be honest.
It works if you don't have a huge data set to transfer, else it might get tricky.
Why?
Basically a virtual component, for what I was able to understand (by reverse engineering existing components) is created using 3 components :
1) "real" input component
2) a "real" output component
3) a "virtual" compoentn that has only icon, properties and xml descriptor
In the virtual one, in the CODEGENERATION part of the descriptor, you have something like this :

 <CODEGENERATION>
  <TEMPLATES INPUT="BufOut" OUTPUT="BufIn">
			<TEMPLATE NAME="BufOut" COMPONENT="tBufferTestOut">
				<LINK_TO NAME="BufIn" CTYPE="ROWS_END" />
			</TEMPLATE>
			<TEMPLATE NAME="BufIn" COMPONENT="tBufferTestIn" />
			<TEMPLATE_PARAM SOURCE="self.SCHEMA" TARGET="BufIn.SCHEMA" />
			<TEMPLATE_PARAM SOURCE="self.SCHEMA"
				TARGET="BufOut.SCHEMA" />
			<TEMPLATE_PARAM SOURCE="self.UNIQUE_NAME"
				TARGET="BufOut.DESTINATION" />
			<TEMPLATE_PARAM SOURCE="self.UNIQUE_NAME"
				TARGET="BufIn.ORIGIN" />
		</TEMPLATES>
  </CODEGENERATION>

In my example the components are :
tBufferTestOut
tBufferTestIn
tBufferTest (the virtual one)
this xml part here is used to assign the input and output real components.

<TEMPLATES INPUT="BufOut" OUTPUT="BufIn">
<TEMPLATE NAME="BufOut" COMPONENT="tBufferTestOut">
<LINK_TO NAME="BufIn" CTYPE="ROWS_END" />
</TEMPLATE>
<TEMPLATE NAME="BufIn" COMPONENT="tBufferTestIn" />

You wil notice that BufOut is assigned as input and BufIn as output... it's not a typo, I really wanted it that way.
Why?
Your virtual component needs to get (input) data from something, and this something needs to be able to provide (output) data to it, so it's basically an OUTPUT component.
Quite confusing eh?
I might add a tutorial on this one...
Anyhow, in a typical TOS data flow, the begin section is executed once, then the main section is executed per each record.
If you have component A and component B connected together, the main section of A and B are executed one after each other at each record.
This requires to hold in memory only one record at a time.
This rule does not apply to virtual components, instead component A will read all the records and will keep them in memory, will do whatever it needs with them and will finally post the result in a memory buffer.
When it is done, control goes to component B which will read this data from the memory buffer, do, whatever it needs with the records and finally outputs them to a data flow.
these declarations :

<TEMPLATE_PARAM SOURCE="self.SCHEMA" TARGET="BufIn.SCHEMA" />
			<TEMPLATE_PARAM SOURCE="self.SCHEMA"
				TARGET="BufOut.SCHEMA" />
			<TEMPLATE_PARAM SOURCE="self.UNIQUE_NAME"
				TARGET="BufOut.DESTINATION" />
			<TEMPLATE_PARAM SOURCE="self.UNIQUE_NAME"
				TARGET="BufIn.ORIGIN" />

are there so that each component can refer to the other one.
Practically, let's take the last declaration
<TEMPLATE_PARAM SOURCE="self.UNIQUE_NAME"
TARGET="BufIn.ORIGIN" />
It basically instructs the virtual component to COPY the value "UNIQUE_NAME" into the BufIn.ORIGIN parameter which is defined in the descriptor of the tBufferTestIn component, XML descriptor :
<PARAMETER NAME="ORIGIN" FIELD="TEXT" NUM_ROW="10"
REQUIRED="true">
<DEFAULT>tBufferTest_1</DEFAULT>
This is because when you deal with virtual components, the "real" ones, being the input and output one, DO NOT expose parameters.
Users will be abel to set only the parameters of the virtual component.
However there is no real java code in the virtual component, so those parameters would not be accessible anywhere.
For this reason the SOURCE / TARGET declaration in the virtual one is used to transfer values to the parameters of the sub components.
I know it's a bit messy... told you I did not like them a lot.
Finally, in the begin template of the testIn component I have :

String origin = ElementParameterParser.getValue(node, "__ORIGIN__");
	for (INode pNode : node.getProcess().getNodesOfType("tBufferTestOut")) {
   		if (!pNode.getUniqueName().equals(origin + "_BufOut")) continue;
		for (IConnection conn : pNode.getIncomingConnections()) {
			rowName = conn.getName();
			break;
		}
		
	}

As you can see I can get "origin" the usual way, then I can get the matching bufOut part.
Similarly, in the bufOut compoennt (begin section) I CAN do something like this :

String destination = ElementParameterParser.getValue(node, "__DESTINATION__");
String rowName= "";
if ((node.getIncomingConnections()!=null)&&(node.getIncomingConnections().size()>0)) {
	rowName = node.getIncomingConnections().get(0).getName();
} else {
	rowName="defaultRow";
}
String outrowName = "";
	for (INode pNode : node.getProcess().getNodesOfType("tBufferTestIn")) {
		if (!pNode.getUniqueName().equals(destination + "_BufIn")) continue;
		for (IConnection conn : pNode.getOutgoingConnections()) {
			outrowName = conn.getName();
			break;
		}
	}

Hope it helps..

Anonymous · ‎2012-10-05

finally i did it! I was able to build a virtual component with your advices, precious as always

i still have to resolve some minor issues, but the whole process works like a charm, actually
a last question: how to propagate an input flow in the output.
lemme explain...
my component takes a schema in input, then calls a remote servises that add some columns in the output schema.
Let's say i have an incoming schema with two columns and three rows:
1 A
2 B
3 C
I propagated the entire rowStruct collection in a globalMap buffer and i passed to the "in" stage of the virtual component.
Then i pass a vector (let's say, the second colum) to my remote service that returns something like that
A foo
B bar
C gaz
Now i need to make a join to have a final output dataflow like:
1 A foo
2 B bar
3 C gaz
Ofc, there's tons of pure-java solution to make this join. However, it definitively looks like a regular Talend hash lookup reference, i think that part of the code could be already available. As i say, i would like to do that in a pure Talend way (because of performance, because of code cleaness, because i'm lazy)
So, i was wandering, how to use the tHash costructs to make an inplicit join in the "in" stage of the virtual component between the buffer coming by the "out" stage and the columns added by the webservice call?
as always, @sabuto, tnx in advance!
gabriele

how to design component that adds data based on main flow

Java

Other

Talend Data Integration

v5.x