Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Shalafi
Contributor II
Contributor II

Process rows in batch in tJavaFlex

Hello community.
I'm new in Talend. I have spent about one month learning by my own.
I have reached a point where I'm not able to continue, and reading dozens of topics did not help me.


I read a batch of rows from a tDBInput (PostgreSql); then, I have to process these rows "in one shot" to calculate some extra columns, and finally store the data in a tDBOutput.
With "in one shot" I mean that the extra columns have to be calculated taking into account the whole set of data read by tDBInput, not just row by row. I don't know in development time how much rows will be read in the tDBInput component.

 

In detail - component (schema) -:
tDBInput (col1, col2) ----------> tJavaFlex (col1, col2, extraCol) ---------> tDBOuput (col1, col2, extraCol)

 

Let's say my data is (not relevant):
col1 col2
A 10
B 15
C 17
D 14

 

The extraCol has to be calculated from the whole set of data; it is the result of a complex business rule. The requirement is that I need access to all rows (A, B, C and D) before the extraCol is calculated.

 

My approach was to build a List in the tJavaFlex containing all the data (all the rows) coming from the tDBInput. Then (after having all rows in the list), I need to execute my business code to calculate the extraCol for each row, and finally propagate the data to the next component (tDBOutput).

I am not able to do that.
Firstly, I cannot "stop" propagating the data from tJavaFlex to tDBOuput until the whole set of rows is read. Talend in passing the data to tDBOutput row by row, which is not not desired for me.
Secondly, assuming the first problem is solved, I don't know how to trigger the process to send the data to tDBOutput once the extraCol is calculated.

 

Anyone can help me? Thank you so much in advance!

Labels (3)
1 Solution

Accepted Solutions
Shalafi
Contributor II
Contributor II
Author

Thank you for your answer.

 

The quick answer is no. The extraCol is the result of a very complex set of business rules, so it has to be calculated using Java code. SQL is not possible here. About the size of data... sadly I have to say that we are talking about several millions of rows in the database table. Actually, I'm querying the database using batches of data (with divisions that I am able to do and does not affect my business rules) to minimize the impact in the process.

 

I have found a workaround to accomplish my needs, but I'm not sure whether it is the best approach. What I have done is this:

 

tDBInput (col1, col2) ---- Main ----> tJavaFlex_1 (col1, col2, extraCol) ---- On Subjob Ok -----> tJavaFlex_2 (col1, col2, extraCol) ---- Main ----> tDBOutput (col1, col2, extraCol)

 

The tJavaFlex_1 component is the one shown in my original post. In it, I build a list containing all the rows, and in the end part I applied my code to calculate the extraCol. Finally, the whole list is stored in a globalMap variable.

The tJavaFlex_2 component just reads the globalMap variable and passes the data row by row to the tDBOutput.

Using this approach, I can "stop" the process between tJavaFlex_1 and tJavaFlex_2 until all rows are processed (thanks to the On Subjob Ok link), and finally send it to tDBOutput row by row.

As I said, I don't know if this is the best option, but my requirements are satisfied this way. I share it in the hope it will help others with my same problem, and maybe somebody gives me a better approach.

 

Thank you so much!

View solution in original post

3 Replies
billimmer
Creator III
Creator III

Thoughts:  Can you do this in SQL with a more complex query?  Or if the number of rows of input data is small you could try a tMemorizeRows to do this.  Or you could use connect your input to a tmap, and then connect a "lookup" query  to get all the records for each row read.

 

The first one would be my preference as it would have the best performance.

Shalafi
Contributor II
Contributor II
Author

Thank you for your answer.

 

The quick answer is no. The extraCol is the result of a very complex set of business rules, so it has to be calculated using Java code. SQL is not possible here. About the size of data... sadly I have to say that we are talking about several millions of rows in the database table. Actually, I'm querying the database using batches of data (with divisions that I am able to do and does not affect my business rules) to minimize the impact in the process.

 

I have found a workaround to accomplish my needs, but I'm not sure whether it is the best approach. What I have done is this:

 

tDBInput (col1, col2) ---- Main ----> tJavaFlex_1 (col1, col2, extraCol) ---- On Subjob Ok -----> tJavaFlex_2 (col1, col2, extraCol) ---- Main ----> tDBOutput (col1, col2, extraCol)

 

The tJavaFlex_1 component is the one shown in my original post. In it, I build a list containing all the rows, and in the end part I applied my code to calculate the extraCol. Finally, the whole list is stored in a globalMap variable.

The tJavaFlex_2 component just reads the globalMap variable and passes the data row by row to the tDBOutput.

Using this approach, I can "stop" the process between tJavaFlex_1 and tJavaFlex_2 until all rows are processed (thanks to the On Subjob Ok link), and finally send it to tDBOutput row by row.

As I said, I don't know if this is the best option, but my requirements are satisfied this way. I share it in the hope it will help others with my same problem, and maybe somebody gives me a better approach.

 

Thank you so much!

Anonymous
Not applicable

MyGroundBiz wrote:

Thank you for your answer.

 

The quick answer is no. The extraCol is the result of a very complex set of business rules, so it has to be calculated using Java code. SQL is not possible here. About the size of data... sadly I have to say that we are talking about several millions of rows in the database table. Actually, I'm querying the database using batches of data (with divisions that I am able to do and does not affect my business rules) to minimize the impact in the process.

 

I have found a workaround to accomplish my needs, but I'm not sure whether it is the best approach. What I have done is this:

 

tDBInput (col1, col2) ---- Main ----> tJavaFlex_1 (col1, col2, extraCol) ---- On Subjob Ok -----> tJavaFlex_2 (col1, col2, extraCol) ---- Main ----> tDBOutput (col1, col2, extraCol)

 

The tJavaFlex_1 component is the one shown in my original post. In it, I build a list containing all the rows, and in the end part I applied my code to calculate the extraCol. Finally, the whole list is stored in a globalMap variable.

The tJavaFlex_2 component just reads the globalMap variable and passes the data row by row to the tDBOutput.

Using this approach, I can "stop" the process between tJavaFlex_1 and tJavaFlex_2 until all rows are processed (thanks to the On Subjob Ok link), and finally send it to tDBOutput row by row.

As I said, I don't know if this is the best option, but my requirements are satisfied this way. I share it in the hope it will help others with my same problem, and maybe somebody gives me a better approach.

 

Thank you so much!


Interesting stuff to read. Keep it up.