Iterator over all input row combinations (not memory buffered)
I am new to Talend and looking for a solution for iterating over all row combinations of
input data (e.g. rows of a CSV file or records in an XML file) in order to perform a rather
complex matching (using the tMap component) and subsequent removal of so identified
duplicates. Due to the large amount of data I would like to iterate multiple times over the input data
(eventuelly improve performance with caching).
I have something as the following in mind: Two pointers on the input data, one iterating
from the beginning to the end, the second always from the position of the first pointer
(actually nothing really special...).
An example for only four rows the iterator would give out the combinations
(which then could be used as input for a tMap component):
- row1, row2 (inc, pointer1, reset pointer2)
- row1, row3 (inc. pointer2)
- row1, row4 (inc. pointer2)
- row2, row3 (inc. pointer1, reset pointer2
- row2, row4 (inc. pointer2)
- row3, row4 (inc. pointer1, reset pointer2)
Questions:
- does such a component already exist or can it be easily constructed, e.g. out of tLoop?
- am I overseeing some basic functionality and could the job be done much more easily?
Thanks for your patience,
mac
But why can't I then feed in the output rows directly into the tMap
tMap is a intermediary component, we can do join/merge/filter or any processing on it, finnaly, we will output the result to output component, like tMysqlOutput, tFileOutputDelimited.
I always see that a second feed is a "Lookup" and not a "Main".
To a tMap, there only exists one main flow and others should be lookup flows.
Best regards
Hi Shong
Exactly...this points to my basic question
. How can I convert a "main flow" to a "lookup flow"
(the question may sound ridiculous, sorry, but only explains my basic knowledge... ;-)) ?
Hello Shong
You say to right click the main flow and to select "set this connetion as lookup".
Unfortunately I cannot find that menu or am I completely in the wrong spot?
For illustation I attached you my screen where I right clicked the main flow of output of tMap_2, although it should
be the output of tMap_1 which I then would like to feed into the tMap_3.
The attached screenshot of your last post shows an example with a database. It this maybe limited to this scenario?
Thanks a lot, I really appreciate your patient support
Hi Shong or anyone having patience with me
I am sure there is a simple explanation why I cannot feed the above to feeds into the tMap.
From my limited point of view it is because one data stream has to be a lookup but how
to get this??? I even checked out some webinars and there they do similar things but can
always input the to records into a tMap.
Am I simply missing a concept of Talend e.g. some issue with synchronicity (guarantee that
both are existing in the tMap...I know...I am starting to come up with some adventurous
ideas but am quite helpless at the moment)
Hello Mac
If you want to see the lookup flow, there must exist more than one flow linked to tMap, one is main flow and others are lookup flow. You should right click on the input flow of tMap, not the output ones.
Best regards