Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I am using tDataShuffling component to shuffle a column which is 8 char length, partitioned on 1st 3 char of the column. eg.
SELECT field_1, substr(field_1, 1, 2) from table_name;
shuffle column value: py13456
partition column value: py1
This is running very slow with 3 rows/s.
This table has around 6 million records and the buffer size of the tDataShuffle component is 100000 with Seed generator - 12345678.
At the job level I have set Multi Thread execution with Parallelize Buffer Unit Size - 25000
Kindly suggest the ways to improve the performance of this component.
Thanks.
Hi,
I have tried the below:
Cursor: 100000 for tDBInput
rownum < 100000
At job level Max heap size to 2048M(Job run JVM Settings)
Is there anything I could do at tDataShuffle component level.
Could you kindly reply.
Hi,
The Job flow has:
tDBInput (with cursor ) ----> tDataShuffle -----> tDBOutput (update operation)
Db input component Cursor: 100000
Db input query Rownum: 100000
Shuffling Buffer size: 100000
Job Multi thread Parallelize Buffer Unit Size: 25000
Job Min heap space: -Xmx1024M
Job Max heap space: -Xmx4096M
I have used Db output Batch size: 50000.
This job is running for more than 30 mins and have not completed.
Would saving data in cache - tHashOutput before tDataShuffle, improve performnace?
Hello,
Would you mind posting your current job design screenshots on forum which will be helpful for us to understand your work flow?
Best regards
Sabrina