Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Data duplication handling - scenario

Hi,

 

Can I get solution for the below scenario.

 

I am getting two records for a single product, only difference is one column value, but I need both values to be loaded into target on different columns. Duplication removal won't help me.

It would be great if I get an idea of optimal solution, because huge number of records coming from source.

 

Talend version: Open Studio for Big Data 7.0.1

Target: Salesforce

 

source:

ProductName  ImageType  imageurl
Nokia 610   LARGE /devices/generic-phone.png
Nokia 610   SMALL /devices/5145.jpg

 

Required Data on Target: (based on 'Image Type' above)

ProductName main_image_URL thumbnail_image_URL
Nokia 610  /devices/generic-phone.png /devices/5145.jpg

 

I tried by writing expression in tmap, but output is not as expected (below), it will create duplicate in target

 

ProductName main_image thumbnail_image
Nokia 8110 /devices/generic-phone.png  
Nokia 8110   /devices/5145.jpg
Labels (4)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hi,

 

    Why don't you take it as two data sets at the bginning and then do an inner join?

 

Dataset one :- where ImageType ="LARGE"

 

ProductName  ImageType  imageurl
Nokia 610   LARGE /devices/generic-phone.png

 

Dataset two:- where ImageType="SMALL"

 

ProductName  ImageType  imageurl
Nokia 610   SMALL /devices/5145.jpg

 

Now, do inner join based on Product Name and map the values two output flow in tMap as two variables.

 

Mapping in the tMap

 

ProductName -> ProductName

imageurl(small) -> thumbnail_image_URL

imageurl(big) -> main_image_URL

 

This should give the desired output.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

 

 

 

 

View solution in original post

6 Replies
Anonymous
Not applicable
Author

Hi,

 

    Why don't you take it as two data sets at the bginning and then do an inner join?

 

Dataset one :- where ImageType ="LARGE"

 

ProductName  ImageType  imageurl
Nokia 610   LARGE /devices/generic-phone.png

 

Dataset two:- where ImageType="SMALL"

 

ProductName  ImageType  imageurl
Nokia 610   SMALL /devices/5145.jpg

 

Now, do inner join based on Product Name and map the values two output flow in tMap as two variables.

 

Mapping in the tMap

 

ProductName -> ProductName

imageurl(small) -> thumbnail_image_URL

imageurl(big) -> main_image_URL

 

This should give the desired output.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

 

 

 

 

manodwhb
Champion II
Champion II

@Vibin_CT ,check below job.

0683p000009M1wP.png0683p000009M30b.png0683p000009M30g.png0683p000009M2wV.png

Anonymous
Not applicable
Author

Have u tried use tUniqueRow component before tMap?

 

I think what u need is something like this?

 

0683p000009M2S1.png

 

The desired output should be something like this

 

Anonymous
Not applicable
Author

Hi @nthampi ,

 

Thank you very much for your solution.

In-order to use your method, I want to load data into a intermediate database table, because the problem which I mentioned is not directly from source data, it is an intermediate data coming after doing so many transformations and I was unable to load this data into MySQL db(staging) due to MySQL table size limitation (some columns contains appended data and size is huge). So I am using thashoutput component.

 

I am also planning to use tfileoutputdelimited instead of thashoutput due to huge number of records and record size. Can you suggest me, which component is better to use by considering memory and performance.

Anonymous
Not applicable
Author

Hi,

 

    Considering your use case, park the data as interim file using tfileinputdelimited.

 

     Also increase the memory parameters (Xms and Xmx) of the job for better job performance.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Thanks @nthampi !!