Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a component that needs the whole dataset to do some processing on it. One of the ways that I have gone forward is implementing batch processing and entering the maxBatchSize as the dataset size. So once the buffer size matches the maxBatchSize I do the processing. However, I have been facing a couple of problems with the above approach.
1. The @AfterGroup method is executed twice once when buffer size is 100 and then when it reaches the maxBatchSize (that's when the processing happens as there is a check in the aftergroup). Any reasons why is the AfterGroup method executed twice? Also, can anyone shed some more light on the life-cycle of these methods:
a. BeforeGroup
b. ElementListener
c. AfterGroup?
2. Is there any way to determine the actual size of the input Dataset?
Hello,
Here is documentation part that explain how the batch processing works in talend applicaitons. https://talend.github.io/component-runtime/main/1.1.2/concept-processor-and-batch-processing.html
Generally the BeforeGroup and AfterGroup methods are executed every max batch size. And they can be executed a last time to process the renaming records from the pipeline. For example if you have a max batch size of 3 and you have 5 records. The methods will be executed after the third record and also after the 5 record.
2. Normally, a processor doesn't have to be aware of the real size of the dataset. If you really need this information in your component logic, as it may be a business information, you may calculate it in an input component and passed it to the processor.
Yeah it would make sense to execute the AfterGroup method for every maxBatchSize and then the remaining bunch, but for me it executes it at 100 even when maxBatchSize>100.
What is your studio version ?
Bulk processing was under development in 7.0.1 and it's prior versions.
You may need to upgrade your studio version to the latest up coming release to get the stable features.
Or, you can also get the latest milestone version from https://sourceforge.net/projects/talend-studio/files/Talend%20Open%20Studio/