Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We have input file in following format.
empno name add
1 abc address1
1 abc address2
1 abc address3
1 abc address4
.
.
.
2 mno address7
2 mno address8
2 mno address9
.
.
.
Output should be like below as first two column 's values are repeating thus
1 abc address1 address2 address3 address4....
2 mno address7 address8 address9....
.
.
.
Please suggest how to create file with above output.
Regards,
Vivek
Hello,
This is an out of memory error for uniqrow.
First of all, for processing 400M data you need to increase the heap space.
Secondly, in tUniqRow, in advanced settings, Check the option "Use Of Disk" and pick, "Buffer Size in Memory to Medium (1 Million)"
This will ensure only 1 Million rows are processed in Memory and rest all will be processed in disk.
Try to run the job with max heap space Xmx5G
Thanks and Regards,
Subhadip
Hi Vivek,
I believe following solution will help you.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Thanks.But we have multiple columns under TDenormalize component .But output is little different.
i.e.
suppose we have pincode column as well with Address.So we mentioned Address and Pincode both under 'To Denormalize' section.But output is showing all addresss and then all pincode
i.e. address1;address2;address3;address4,54678,54890,58765,52345
But our requirement is to display one row complete data then next row data ....
address1;54678;address2;54890;address3;58765;address4;52345
Please assist how to achieve this in talend.
Regards,
Vivek
Hello Vivek,
Before passing the data to tDeNormalize, concat the values of Address & PinCode in a tMap.
The flow will be tMap ( Address+";"+PinCode ) -> tDenormalize.
Thanks and Regards,
Subhadip
Yeah. Add Pincode as the last part using a tMap and then pass the value to the denormalize.
I hope we have answered your query. Could you please spare a second to mark the post as resolved? Often members ignore this part when they get the solution and overlook the contribution made by authors to Talend community in between their own busy schedules 😞
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Thanks All.
That worked for us.However we have total more than 400 millions of records.My graph component is as per below.
TOracleInput -> Tfileoutput -> Tmap -> Tdenormalize -> Tfileoutput
This graph is not able to process 400 millions record and throwing below error.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at artemis_dev.test3_0_1.Test3.tOracleInput_1Process(Test3.java:4781)
at artemis_dev.test3_0_1.Test3.runJobInTOS(Test3.java:5971)
Above error came after processing around 200 millions records but the process is too slow.
Please suggest how to fix above error and how can we improve execution time.We want to tune this graph at max and need to reduce total timings.
Regards,
Vivek
Hi,
Please increase the Java memory settings in run tab according to your input data flow capacity.
I would also recommend to use disk space for tMap operations. Please refer the below link for this step.
https://help.talend.com/reader/EJfmjmfWqXUp5sadUwoGBA/J4xg5kxhK1afr7i7rFA65w
Once its resolved, I humbly request you to mark the solution as closed since we have answered your initial and supplementary question. If you have any new issue, please ask it as new post instead of putting all the queries into single post there by diluting the focus.
Some community members often overlook this aspect once they get the solution ignoring the time spent by contributors to answer the query 😞
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hello,
This is an out of memory error for uniqrow.
First of all, for processing 400M data you need to increase the heap space.
Secondly, in tUniqRow, in advanced settings, Check the option "Use Of Disk" and pick, "Buffer Size in Memory to Medium (1 Million)"
This will ensure only 1 Million rows are processed in Memory and rest all will be processed in disk.
Try to run the job with max heap space Xmx5G
Thanks and Regards,
Subhadip