tUniqrow: Sending all duplicate rows to Duplicates... - Qlik Community

Anonymous · ‎2009-04-06

I am attempting to use a tUniqRow transform for its intended purpose -- row deduplication. However, tUniqRow sends the "First" duplicate row to the Uniques output path and each subsequent duplicate row to the Duplicates output path.
The requirements for my current project dictate that ALL rows which match one another (based on defined keys) are passed to the Duplicates output path and only rows which are truly unique be passed to the Uniques path.
To comply with this requirement, I am attempting to outer join (via tMap or tFuzzyMatch) the Uniques output with the Duplicates output from the same tUniqRow transform. However, Talend does not allow me to connect the two data paths from tUniqRow ? even when those paths are ?interrupted? by another transform. The designer refuses to connect the 2nd (lookup) row path to the tMap or tFuzzyMatch component.
The described section of this job is designed as follows:

         --> tSortRow  -->
tUniqRow                  tMap or tFuzzyMatch
         --> tSortRow  -->

Is this a bug or is there an inherent reason why the Uniques and Duplicates data streams can?t be connected?

Anonymous · ‎2009-04-06

Hello
Talend doesn't allow a cycle flow in a job. see 1468
Best regards
shong

Anonymous · ‎2009-04-07

Talend doesn't allow a cycle flow in a job. see 1468

So, the solution is to manually "cache" (by creating interim file or table outputs) the data between steps?
Also, do you know of a better way to approach this problem?

Anonymous · ‎2009-04-08

Hello

Also, do you know of a better way to approach this problem?

Can you take an example to describe your input data and expected result?
Best regards

shong

Anonymous · ‎2009-04-10

Can you take an example to describe your input data and expected result?

Example input csv file:

Name, Address
John Smith, 111 Main Street
Bob Dole, 1234 Pine Street
John Smith, 111 Main Street

The tUniqRow component processes this file as follows:

--Uniques--
John Smith, 111 Main Street
Bob Dole, 1234 Pine Street

--Duplicates--
John Smith, 111 Main Street

But the project requirements dictate that we produce output as follows:

--Uniques--
Bob Dole, 1234 Pine Street

--Duplicates--
John Smith, 111 Main Street
John Smith, 111 Main Street

I suspect this can be done by joining the 2 tUniqRow output streams to each other (several times), but I am still working through the proof-of-concept. I am hoping there is a better way to approach this problem.

Anonymous · ‎2009-04-12

Hello
You need to split your job to two subjobs, as Talend don't allow cycle flow in a job. Please see my scenario,
in.csv:

John Smith, 111 Main Street
Bob Dole, 1234 Pine Street
John Smith, 111 Main Street
shong, 222 main Street

Result:

Starting job forum6139 at 09:45 13/04/2009.
.--------+-----------------.
|        tLogRow_1         |
|=-------+----------------=|
|name    |address          |
|=-------+----------------=|
|Bob Dole| 1234 Pine Street|
|shong   | 222 main Street |
'--------+-----------------'
Job forum6139 ended at 09:45 13/04/2009.

Best regards

shong

Anonymous · ‎2013-09-09

Hi,
It seams a bit of a misnomer to have the Uniqrow component return non unique rows (I've also just found out that the tMap Unique Match is not a unique match but last match).
Perhaps the option to not return rows with duplicates could be added to advanced options for tUniqrow?
Cheers Andy
Edit -----------------------------
My Solution
1. Store Unique's and Duplicates in Hash Map
2. Inner Join these back together on the same basis (This case First/Last Name)
3. Catch Inner join rejects
These are then your Unique Unique's
Edit ---------------------------
See https://jira.talendforge.org/browse/TDI-28405 for tUniqRow bug and
https://jira.talendforge.org/browse/TDI-28406 tMap bug

Anonymous · ‎2013-09-20

I agree with Andy, this is odd that we cannot really filter / identify the duplicates directly with tUniqRow.
On my side , I've got a required to identify all duplicates line, I've done a variation of the previous solution using a tJoin with the initial file and the duplicates rows that the tUniqRow give as a result.

Anonymous · ‎2013-09-23

Hi,
It seams a bit of a misnomer to have the Uniqrow component return non unique rows (I've also just found out that the tMap Unique Match is not a unique match but last match).
Perhaps the option to not return rows with duplicates could be added to advanced options for tUniqrow?
Cheers Andy

Hi Andy
Yes, a new feature can be added to this component to get the rows that really unique in the source data, this is a common request, can you please report a feature issue in our bugtracker?
Shong

Anonymous · ‎2013-12-11

Hi added bugs to the tracker links above, I consider these bugs as this behaviour contradicts the expected results and is not documented.

tUniqrow: Sending all duplicate rows to Duplicates destination

Other

Talend Data Integration