Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Talend Cloud AWS EU Scheduled Outage: Starting Tues 26 May 21:00 CEST with expected completion Wed 27 May 01:00 CEST
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

tDataShuffling - improving performance

Hi,

 

I am using tDataShuffling component to shuffle a column which is 8 char length, partitioned on 1st 3 char of the column. eg.

SELECT field_1, substr(field_1, 1, 2) from table_name;

shuffle column value: py13456

partition column value: py1

 

This is running very slow with 3 rows/s.

This table has around 6 million records and the buffer size of the tDataShuffle component is 100000 with Seed generator - 12345678.

At the job level I have set Multi Thread execution with Parallelize Buffer Unit Size - 25000

 

Kindly suggest the ways to improve the performance of this component.

 

Thanks.

 

 

Labels (2)
4 Replies
Anonymous
Not applicable
Author

Hi,

 

I have tried the below:

Cursor: 100000 for tDBInput

rownum < 100000
At job level Max heap size to 2048M(Job run JVM Settings)

Is there anything I could do at tDataShuffle component level.

 

Could you kindly reply.

 

 

 

Anonymous
Not applicable
Author

Hi,

 

The Job flow has:

tDBInput (with cursor ) ----> tDataShuffle -----> tDBOutput (update operation)

 

Db input component Cursor: 100000

Db input query Rownum: 100000

Shuffling Buffer size: 100000

Job Multi thread Parallelize Buffer Unit Size: 25000

Job Min heap space: -Xmx1024M

Job Max heap space: -Xmx4096M

Anonymous
Not applicable
Author

I have used Db output Batch size: 50000.

This job is running for more than 30 mins and have not completed.

 

Would saving data in cache - tHashOutput before tDataShuffle, improve performnace? 

Anonymous
Not applicable
Author

Hello,

Would you mind posting your current job design screenshots on forum which will be helpful for us to understand your work flow?

Best regards

Sabrina