Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Iterator over all input row combinations (not memory buffered)

I am new to Talend and looking for a solution for iterating over all row combinations of
input data (e.g. rows of a CSV file or records in an XML file) in order to perform a rather
complex matching (using the tMap component) and subsequent removal of so identified
duplicates. Due to the large amount of data I would like to iterate multiple times over the input data
(eventuelly improve performance with caching).
I have something as the following in mind: Two pointers on the input data, one iterating
from the beginning to the end, the second always from the position of the first pointer
(actually nothing really special...).
An example for only four rows the iterator would give out the combinations
(which then could be used as input for a tMap component):
- row1, row2 (inc, pointer1, reset pointer2)
- row1, row3 (inc. pointer2)
- row1, row4 (inc. pointer2)
- row2, row3 (inc. pointer1, reset pointer2
- row2, row4 (inc. pointer2)
- row3, row4 (inc. pointer1, reset pointer2)
Questions:
- does such a component already exist or can it be easily constructed, e.g. out of tLoop?
- am I overseeing some basic functionality and could the job be done much more easily?
Thanks for your patience,
mac
Labels (3)
15 Replies
Anonymous
Not applicable
Author

Hello Mac
Can you take some data to explain your request? What are your input data and what are your expected output data?
Best regards
shong
Anonymous
Not applicable
Author

Hi Shong
I actually want to compare/process every row combination. I will make a simple example.
Lets say I have an XML file with construction pieces (here only 4):
<pieces>
<piece>
<id>1</id>
<color>rouge</color>
<height>200</height>
<width>349</width>
</piece>
<piece>
<id>2</id>
<color>azul</color>
<height>243</height>
<width>299</width>
</piece>
<piece>
<id>3</id>
<color>rot</color>
<height>1205</height>
<width>340</width>
</piece>
<piece>
<id>4</id>
<color>bleu</color>
<height>200</height>
<width>39</width>
</piece>
</pieces>
The iterator would now give me as output in every iteration two records to process. Finally, after iterating over all elements, I will have been able to compare each piece with each other. As I described my idea before. the iterator would give me:
piece ID 1 and 2
piece ID 1 and 3
piece ID 1 and 4
piece ID 2 and 3
piece ID 2 and 4
piece ID 3 and 4
The idea behind all is that with every iteration I can process two records and filter them out, correct them or do what ever I want.
In this case, I could for instance translate the colors from different languages to english (e.g. by a lookup table), or decide if height or width are considered equal if they differ by a certain deviation. Like this, I am able to do data cleasing by defining an equality function for filtering out equal pieces.
Thanks for your support!
mac
Anonymous
Not applicable
Author

Hello Mac
In Talend, use the 'iterate' link will fit your need. Please see my screenshots.
My forum6435.xml:

<?xml version="1.0" encoding="ISO-8859-15"?>
<root>
<pieces>
<piece>
<id>1</id>
<color>rouge</color>
<height>200</height>
<width>349</width>
</piece>
<piece>
<id>2</id>
<color>azul</color>
<height>243</height>
<width>299</width>
</piece>
<piece>
<id>3</id>
<color>rot</color>
<height>1205</height>
<width>340</width>
</piece>
<piece>
<id>4</id>
<color>bleu</color>
<height>200</height>
<width>39</width>
</piece>
</pieces>
</root>

Hope you understand well on 'iterate' usage.
Let me know if you have any questions!
Best regards
shong
Anonymous
Not applicable
Author

Hi Shong
Great, this looks like that what I am searching for, thank you!
One tiny extension/question:
My given example was unfortunately a little bit imprecise, in respect of the given ID. In my case
this is an ascending value but in reality could be anything. Thus, the ID can't be taken in my case
as a filter condition. It would have to be some sort of row counter (of the input component e.g.
tFileInputXML) which I couldn't find as a property there. I also checked if the schema could be
extended by some sort of "auto increment" value but I also didn't find any information on that
either. Do you suggest working with the tFileRowCount or is there a direct way?
Thanks again.
Anonymous
Not applicable
Author

Don't know if this is the right way to go or if there is a more elegant solution, but I could
get requested behavior by adding the expression:
tos_count_tFileInputXML_2>tos_count_tFileInputXML_1
in the advanced section (Basic Settings) of the tFIlterRow (see screen 3 in the above post)
Anonymous
Not applicable
Author

Hello
Do you suggest working with the tFileRowCount or is there a direct way?

Yes, add a new column: id, it is a sequence digit for each row. Please see screenshots.
Best regards

shong
Anonymous
Not applicable
Author

Hi Shong
Thanks! Perfect help and your solution helps me getting to know Talend better and better.
However I must bother you again with a problem related to above. Although I considered the User and Developer manual,
I can't see how to bring the outputs (currently written out with a tLogRow) together so I can implement (comparison)
logic based on both inputs.
As far as I understand Talend, a tMap would be a good way to do this, but this cannot handle two input mains. I find
other examples always with a Lookup input but this again mostly in combination with database inputs.
Do I miss some important basic concept in Talend or do I oversee simply something? (blush)
Anonymous
Not applicable
Author

Hello
currently written out with a tLogRow

tLogRow is used to print the result on cosole, it just for debug purpose. In real job, you will output the result to file or database, so you can use tFileOutputxxx/txxxSQLOutput to replace tLogRow.
but this cannot handle two input mains. I find
other examples always with a Lookup input

In fact, we can regards lookup flow as main flow. tMap is a very useful and powerful compnent, we can do merge,filter or any data processing on multiple input flow on tMap.
Best regards

shong
Anonymous
Not applicable
Author

But why can't I then feed in the output rows directly into the tMap (instead of the tLogRow - yes, I understood it that way, that these components are only for debugging/logging purpose)?
I can attach one main to the tMap but not a second one. If I look at other examples I always see that a second feed is a "Lookup" and not a "Main".
In the documentation I find a note about "Lookup" but not how to handle or obtain these...