[resolved] Writing to two sheets in the same excel file from one job
Hi all, I have a problem writing to two different sheets in the same excel file from the same job, below is the scenario : tFileInputExcel - > tMap -> tReplicate .. replicate1 .. -> tFileOuputExcel(sheet1) .. replicate2 .. -> tMap -> tFileOuputExcel(sheet2) tFileOuputExcel is the same file. If I change one of the tFileOutputExcel file name to a different file name, then both files are created with the respective sheet names(sheet1, sheet2), however they are in different files which does not suite my purpose. Is this a bug in Talend ?
This is one of the reason I build my own Excel components to:
write multiple sheets in one job
read templates and write into
reuse formattings
write formulas
and a lot more...
Take a look at the tFileExcelSheetOutput component (in conjunction with tFileExcelWorkbookOpen and tFileExcelWorkbookSave).
These components are available at Talend Exchange (with documentation)
http://www.talendforge.org/exchange/index.php and search for tFileExcel
Hello, You can write to multiple sheets in the same spreadsheet, but they need to be done in different subjobs. From what you describe, you could take the tMap output and push it into a tHashOutput (instead of the tFileOutputExcel), then in another subjob have a tHashInput go to the second sheet in the spreadsheet. Hope that helps.
This is one of the reason I build my own Excel components to:
write multiple sheets in one job
read templates and write into
reuse formattings
write formulas
and a lot more...
Take a look at the tFileExcelSheetOutput component (in conjunction with tFileExcelWorkbookOpen and tFileExcelWorkbookSave).
These components are available at Talend Exchange (with documentation)
http://www.talendforge.org/exchange/index.php and search for tFileExcel
Thank you for the responses. I would prefer to do it in the same job as it is already a subjob. I will try the tFileExcelSheetOutput combo first ! Let you folks know how it goes shortly.
Perfect !! Thanks folks, used the exchange fileexcel components and they work much better.
Hi Jan,
I am currently using your excel components to write multiple sheets to the same spreadsheet. I have subjobs each writing to one sheet but I am usig a tparallelize to run all the subjobs. At then end of the job, I dont see all the sheets. Its not consistent. Even though I'm saving the workbook and opening in every subjob it doesnt seem to save some sheets. do you have any suggestions?
Thanks
Shuba
Hi, you cannot write into one excel workbook with multiple threads. My component does not recognise such situation yet but it impossible because you write actually into one huge XML file and the Apache POI API also does not allow multi threading - it is NOT thread save! I cannot predict what will happen if you try this, but to get an inkonsistent document is for sure a worst case scenario.
Thanks for your reply. That's correct. I get inconsistent document every time. I guess have to end up adding the sheets sequentially and think of other ways to speedup.
Sorry for not having a better solution. If you have to create multiple documents, you can run in parallel these processing but not within one document.
Hi Jan,
I took your suggestion and I am using your custom components. tFileExcelWorkbookOpen, Save and tfileExcelSheetOutput. It was working great as long as I was creating .xls files. Yesterday we had a requirement to switch to .xlsx instead. and when I changed it in the jobs, it started giving me several errors.
1.
# A fatal error has been detected by the Java Runtime Environment:
# SIGBUS (0x7) at pc=0x00007f542f9b3732, pid=34686, tid=139998362167040
# JRE version: Java(TM) SE Runtime Environment (7.0_97-b02) (build 1.7.0_97-b02)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.95-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C newEntry+0x62
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
2. It started throwing error in my second job that the file written and saved in the first job was not found and I could not see the file in the server either.
I have attached some screenshots of my jobs. I have a set of jobs running one after another appending sheets to the same workbook and this works great with .xls. Not sure why .xlsx would have such issues.
Appreciate your help a lot!
Thanks
Shuba
screenshots.zip.zip
Whow ... this error should not happen. The problem here is not your job or my component, the problem here is an error in the JVM.
It is a error in the JVM using the zip library. I can tell you for sure these components works in a huge number of projects and it is not a problems caused by the component, instead I would say it happens because of the typical memory expensive kind of jobs working with excel files.
Can you reproduce this error?
I suggest install another java runtime with a different version. Do you use the Oracle JDK or OpenJDK?