Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
LDPtechAFB
Contributor III
Contributor III

Data duplication in a flow with tHashInput and Tmap

Hello,

i have parsed my data and stored intHashOutput, 0693p000008vKLlAAM.png

Then i connect the job to the database and with a OnsubjobOk activate the next subjob who allows me to retreive data from my output thanks to the THashInput.

the problem is that datas are duplicated :

0693p000008vKLqAAM.png

i don't know why the process loop and never stop implementing my database.

I configure the tmap with the option :

-Load once

0693p000008vKM5AAM.png

Maybe the format of my database is an issue, i have some problem concerning the foreign key attribution. But the problem seems to come from the thashinput.

thank for your answers

Regards.

Lucas

Labels (4)
1 Solution

Accepted Solutions
JohnRMK
Creator II
Creator II

Hello ,

 

If there is no join between the inputs in your tMap, it will perform a scalar product (produit scalaire) so all the data will be duplicate.

 

You can use tUniqRow on the output to deduplicate

View solution in original post

9 Replies
manodwhb
Champion II
Champion II

@not specified not specified​ ,when you say data duplicating means ,is that the same data loading all three target tables? If yes then the problem may be with tMap configuration issue. can you share the based on what criteria are you loading to target tables from tMap.

LDPtechAFB
Contributor III
Contributor III
Author

Yes the same datas are loaded to the database.

what didyou mean by the criteria ?

here is some configuration of my thashinput, tmap, thashoutput.

0693p000008vKMeAAM.png

 

0693p000008vKMjAAM.png

 

0693p000008vKMoAAM.png

JohnRMK
Creator II
Creator II

Hello ,

 

If there is no join between the inputs in your tMap, it will perform a scalar product (produit scalaire) so all the data will be duplicate.

 

You can use tUniqRow on the output to deduplicate

LDPtechAFB
Contributor III
Contributor III
Author

Yes it seems to be that,

But i cannot join the input to each other because there is no similarities between these datas.

I tried with a tUniqRow, but it doesn't work. 0693p000008vKdvAAE.png

I can only put link in my output of the tmap but it seems to be impossible (foreign key of the DB). So i don't know what to do.

If you have some clues !

 

manodwhb
Champion II
Champion II

@not specified not specified​ , Have you pointed all tHashInputs to tHsashOutput_4 only ? then that is problem.

manodwhb
Champion II
Champion II

@not specified not specified​ ,how many columns do you have in each tHashInput's and data types ?

LDPtechAFB
Contributor III
Contributor III
Author

26, 33, 70 for number of column, with data types String, interger, Date, boolean, float with automatic conversion of type.

By the way, the method with the tUniqRow actually work (i just use the wrong configuration of the component).

Now i have a problem from my input

java.lang.NumberFormatException: For input string: ""

0693p000008vLlaAAE.png

 

the tUniqrow may work now. But for now, i have to fix this issue

manodwhb
Champion II
Champion II

@not specified not specified​ , what is the join key if you do not have join column the it will take cartesian join. and the output section how are you populating to tDBOutputs. if you have are taking the cartisian join and populating as same then you will have same data across the all target tables.

LDPtechAFB
Contributor III
Contributor III
Author

I created a column for each input with a default value and i join these column together to avoid the cartesian join. it seems to work well, i will see in the future. Thanks for your answers