Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I'm extracting 260 column data from SQL Server database and trying to write to an existing Excel file with headers.
I am using tFileOutputExcel component.
My Run Parameters are like this:
-xms256M
-xmx2048M
and in Windows -> Preferences ->Talend -> Run/Debug, I gave:
-XX: useGCOverheadlimit.
What else can I do to make this work? I have to write to an existing file since other tabs are filled with data from other processes.
Thanks,
Bee
The job fails with the error message:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.sun.org.apache.xerces.internal.dom.AttributeMap.setNamedItem(AttributeMap.java:121)
at com.sun.org.apache.xerces.internal.dom.DeferredElementNSImpl.synchronizeData(DeferredElementNSImpl.java:138)
at com.sun.org.apache.xerces.internal.dom.ElementNSImpl.getNamespaceURI(ElementNSImpl.java:250)
at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1420)
at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1385)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1370)
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:370)
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
What part of the job is taking the most time and loading most data in memory? Post a screenshot of you job
The job is very simple. Has a tMSSQLInput component which does a select <columns> from table and outputs to Excel.
Does it run out of memory if you overwrite the spreadsheet file instead of appending to an existing one?
Pretty certain this will be down to the Excel output component loading the entire existing spreadsheet into memory in order to append the new rows.
The database input component will only be operating on relatively small batches of records (probably 1,000) at a time, so that wont be the problem, but whilst the basic file output components can just write their output directly to the file on disk, again requiring very little memory, Excel files, especially multi sheet ones, are much more structured than this, and lines can't just be appended to the end of the file, they need to be added within the body of the file, and that will be done in memory.
At a basic level, in the short term, increasing the memory may well help, but if this Excel file is going to keep getting bigger, then the problem is just going to rear its head again soon, and you'll then be out of options.
Excel files aren't particularly well suited to massive volumes of data, and if you're exhausting the 2Gb+ of memory your Java config is allowing, then this suggests a lot of data and/or a big spreadsheet.
If I were you, I'd be looking for an alternative to embedding the data in this spreadsheet. I'm sure if you were to provide a little more background to your requirements, the community members will be able to offer suggestions.
Thank you. The requirement is pretty straightforward and simple, except it is not.
I am executing a stored procedure which generates a very wide (260+ columns) dataset. This needs to be written to a pre-formatted excel sheet (containing 3 other sheets) template file which also contains the header.
A direct tMSSQL to tFileOutputExcel fails with memory issues. So trying to write to a csv hoping to push the data to excel.
But looks like there is no luck at all.
Any idea how I can do this? Number of columns is 260. Number of rows is #2000.
If you're only outputting around 2,000 rows, then I'm very surprised that you're running out of memory.
How big (both in terms of overall file size, and number of rows/columns per sheet etc.) is the template spreadsheet which you're adding the data to?
Also, is it an XLSX file, as opposed to an old XLS format one?
Yes, it is an old xlsx file.
The # of columns is 260 and around 3000 rows right now. But can grow.
Are there any other alternatives?
File with the empty template is around 10MB
There are some Excel related components in the Talend Exchange. (https://exchange.talend.com)
Jan Lolling created a set of Excel components and will often respond here if you have questions about their use. If you can't get the job to work with tFileOutputExcel, maybe you'll have better luck with another one?