Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Problem with batch Processing and processing all rows in Custom component

Hi,

 

I have a component that needs the whole dataset to do some processing on it. One of the ways that I have gone forward is implementing batch processing and entering the maxBatchSize as the dataset size. So once the buffer size matches the maxBatchSize I do the processing. However, I have been facing a couple of problems with the above approach.

 

1. The @AfterGroup method is executed twice once when buffer size is 100 and then when it reaches the maxBatchSize (that's when the processing happens as there is a check in the aftergroup). Any reasons why is the AfterGroup method executed twice? Also, can anyone shed some more light on the life-cycle of these methods:

a. BeforeGroup

b. ElementListener

c. AfterGroup?

 

2. Is there any way to determine the actual size of the input Dataset?

Labels (1)
  • v7.x

5 Replies
Anonymous
Not applicable
Author

Hello,
Here is documentation part that explain how the batch processing works in talend applicaitons. https://talend.github.io/component-runtime/main/1.1.2/concept-processor-and-batch-processing.html

Generally the BeforeGroup and AfterGroup methods are executed every max batch size. And they can be executed a last time to process the renaming records from the pipeline. For example if you have a max batch size of 3 and you have 5 records. The methods will be executed after the third record and also after the 5 record.

2. Normally, a processor doesn't have to be aware of the real size of the dataset. If you really need this information in your component logic, as it may be a business information, you may calculate it in an input component and passed it to the processor.

Anonymous
Not applicable
Author

Yeah it would make sense to execute the AfterGroup method for every maxBatchSize and then the remaining bunch, but for me it executes it at 100 even when maxBatchSize>100. 

Anonymous
Not applicable
Author

What is your studio version ?

Anonymous
Not applicable
Author

V7.0.1
Anonymous
Not applicable
Author

Bulk processing was under development in 7.0.1 and it's prior versions.
You may need to upgrade your studio version to the latest up coming release to get the stable features.
Or, you can also get the latest milestone version from https://sourceforge.net/projects/talend-studio/files/Talend%20Open%20Studio/