Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Create multple workseets in excel dynamically - Performace

I am trying to create multiple worksheets in an excel workbook and the number of worksheets is not known at the start of the job. This is dependent on the values of  a field from my Oracle input. I followed the post to design the same and this works wonders when there is very less data. Even with 18k records from my toracleinput, the iterations completed when run for whole night are just about 500.

https://community.talend.com/t5/Design-and-Development/Create-Multiple-Tabs-with-single-excel-file-b...

I tried to modify the approach using toraclerow (Prepared statement setting) and tparserecordset and couldn't see anything better. I have attached both the approaches design.

The rest of the job completes in about 10 minutes and just the part of final excel creation is the problem now. How can I improve the performance. My toracleinput has a simple SELECT query that return 18k records.

 

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

You only iterate over the groups. The main bulk of your data rows should be supplied by the queries that are triggered by each iteration. For example, given the dataset I gave you earlier.....

 

Iteration 1: You will have a query for class A, which will return rows for every student in class A.

Iteration 2: You will have a query for class B, which will return rows for every student in class B.

Iteration 3: You will have a query for class C, which will return rows for every student in class C.

Iteration 4: You will have a query for class D, which will return rows for every student in class D.

 

Iterating is a little slower, but you will only be iterating over the classes (in this example). All of the data will be returned per class as a normal flow. You are unlikely to notice any performance issues.

View solution in original post

5 Replies
Anonymous
Not applicable
Author

Anonymous
Not applicable
Author

There may be a component in the Exchange which will do this for you. I believe @lli has built a couple of Excel components. Maybe he can point you in the right direction. 

 

Alternatively, if you want to use standard Talend components, you can do this by grouping your data by prospective "sheet" in your query and then running a query for each group and loading the Excel worksheet in an iterative fashion. For example, lets say I have some data that looks like this....

 

Class Name Age
A Ben 12
A John 12
A Jane 12
A Gemma 12
C Mohammad 13
D Abi 13
B Mike 11
B Safron 13
B Jade 14
B Abdul 12
C Henry 13
C Sam 13
C Mikela 12
D Seb 13
D Giles 12
D Jess 13
D Rachel 12

 

Now, if I want a sheet per group, I can do it this way.

 

tDBInput_1-----> tFlowToIterate_1--iterate--->tDBInput_2------>tFileOutputExcel

 

The tDBInput just performs a query on the data above and returns a distinct list of the classes. That is it. It passes that to the tFlowToIterate.

The tFlowToIterate allows us to iterate over the classes for the next step AND saves the current class to the globalMap. This can be used in our where clause for the tDBInput_2 which will return all records for that class. It can also be used in the Excel file sheet name. 

 

When this jobs runs, it will go through an iteration for each class and it is this which will allow you to dynanically add your sheets.

su66
Contributor
Contributor

Thank you for your response.

For very smaller data sets, tflowtoiterate is working absolutely great. But even with 18k records, tDBinput1-->tflowtoiterate is doing around 1-2 rows/second. Is this the expected performance? I have used "Use the default (Key,value) in global variables" in tflowtoiterate basic settings.

 

Anonymous
Not applicable
Author

You only iterate over the groups. The main bulk of your data rows should be supplied by the queries that are triggered by each iteration. For example, given the dataset I gave you earlier.....

 

Iteration 1: You will have a query for class A, which will return rows for every student in class A.

Iteration 2: You will have a query for class B, which will return rows for every student in class B.

Iteration 3: You will have a query for class C, which will return rows for every student in class C.

Iteration 4: You will have a query for class D, which will return rows for every student in class D.

 

Iterating is a little slower, but you will only be iterating over the classes (in this example). All of the data will be returned per class as a normal flow. You are unlikely to notice any performance issues.

Anonymous
Not applicable
Author

This worked as a charm. Thank you so much 0683p000009MACn.png
Kudos!!