Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a job that reads an Excel file using tFileInputExcel, reads some information on it, enriches the information and writhe in the same excel file again. Everything was working as expected while the excel file had 30 lines. Now that I have an Excel file with 400 lines I am getting the following Exception:
Exception in component tFileInputExcel_1 (myjob)
org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
at org.apache.poi.POIXMLFactory.createDocumentPart(POIXMLFactory.java:63)
at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:604)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:266)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:336)
at jobName.myjob.tFileInputExcel_1Process(myjob.java:15007)
at jobName.myjob.tFileInputExcel_2Process(myjob.java:7404)
at jobName.myjob.tFileExcelWorkbookOpen_1Process(myjob.java:640)
at jobName.myjob.runJobInTOS(myjob.java:19176)
at jobName.myjob.main(myjob.java:19011)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:56)
at org.apache.poi.POIXMLFactory.createDocumentPart(POIXMLFactory.java:60)
... 9 more
Caused by: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data. This may indicate that the file is used to inflate memory usage and thus could pose a security risk. You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit. Counter: 827247, cis.counter: 8192, ratio: 0.009902725546299956Limits: MIN_INFLATE_RATIO: 0.01
at org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.advance(ZipSecureFile.java:270)
at org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.read(ZipSecureFile.java:221)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(XMLEntityManager.java:2919)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:302)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1895)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanQName(XMLEntityScanner.java:843)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:193)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2784)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:140)
at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:143)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.StyleSheetDocument$Factory.parse(Unknown Source)
at org.apache.poi.xssf.model.StylesTable.readFrom(StylesTable.java:194)
at org.apache.poi.xssf.model.StylesTable.<init>(StylesTable.java:145)
... 15 more
The file is not that big, around 350k. According to the own exception, it would be required to call apache POI API ZipSecureFile.setMinInflateRatio().
Does anyone know how to solve this? Is there a way to make the Talend component set this on the Apache POI library?
more information about this error:
https://stackoverflow.com/questions/44897500/using-apache-poi-zip-bomb-detected
According to the first Stack Overflow post that you linked, this can be caused when the actual Excel data is so similar that is compresses to almost nothing. If you're only going to be dealing with 64k rows or fewer, my first thought is to use an .xls file instead (i.e. the old Excel 2003 format, which isn't compressed).
If that's not possible, open the Excel spreadsheet, copy the data to a new sheet, delete the old sheet, and save the file: this will reset Excel's internal counters to point only to your data, rather than cells where data has been previously stored (Excel sometimes "remembers" where data has been, which is why a spreadsheet that once contained 500k rows may still scroll down to row 500k after most of the data in these rows has been deleted; if Excel is treating these extra rows as containing nulls or the null string, then the compression algorithm used to create the .xlsx file will be compressing data that is almost entirely identical, which would cause the error you are seeing).
Hope this helps!
Thank you DVSCHWAB,
as a workaround, I have removed some empty columns on the file to decrease the Excel's internal counters. It is still not a valid solution in my scenario as the final user in a data integration solution that uploads those files.
I will give a try using the tFileExcelSheetInput third-party component, as it seems to set the ratio to 0 in APACHE POI.
You can add a tJava before the FileExcel component, and configure this ratio there.
org.apache.poi.openxml4j.util.ZipSecureFile.setMinInflateRatio(-1.0d);