Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Team ,
I am trying to import xml file having size of 2.0 GB through
talend Version: 7.3.1.
Below is sample xml format
<xml>
<local namespace = "Talend" name =
"Community">
<field name = "FName">Sunit</field>
<field name = "LName">S</field>
<field name = "MobNo">9999888877</field>
<field name = "Area">XYSX</field>
<field name = "State">Maharashtra</field>
<field name = "Pincode">421302</field>
</local>
</xml>
I am using simple tfileinputxml to fetch xml input and tDBOutput
to export to db.
Refer image 1
My xpath query and mapping working perfectly fine , with the
input file size upto 200 MB to max 500 MB.
Xpath : "/xml/local" and Mapping
: "*[contains(@name,'FName)]" ......
and below setting
Refer Image 2
Also I configured advanced runtime settings as to handle
huge input.
Refer Image 3
Using above config, we are able to load small file only
max 500 MB.
If I tried any file more than that it throws error as
OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x0000000555380000, 3223322624, 0) failed; error='Cannot
allocate memory' (errno=12)
or
Exception in thread "main" java.lang.OutOfMemoryError:
GC overhead limit exceeded
or
Exception in thread "main" java.lang.OutOfMemoryError:
Java heap space
etc.
Currently code is deployed on system having 16 GB RAM and Linux
as OS.
Kindly check and assist me to solve this issue asap.
Also I would like to use SAX parser ,but it is not working with
current xpath.
Dom4J won't work well unless you have A LOT of memory. It's great for processing smaller files, but terrible for huge files. Maybe try Xerces....although I suspect that this will also have issues. The best parser to use is SAX....but you have the least control with that. But, a way to consider this is "divide and conquer". Use SAX to split the full XML into loop sections, then process each of those sections independently. Select each loop section as a "NODE" and then process the nodes with another component using Dom4J.
Thanks. But how I will test SAX parser working with my XPath query and mapping.
As whenever I try to create generic format by creating xml through metadata, it uses by default Dom4J parser. So it works over there , but fails with SAX.
Please suggest...
It is a terrible idea to have such large XML files. This is for sure a big design flow. The first thing you MUST do is cutting the huge file into smaller pieces! It does not matter with which tool you are trying to process these large file, it is always a nightmare.
I would try to change the generation process to build more than one file (e.g. 100 instead of one) and if this is not possible I would try to use a SAX parser and cut the large file into smaller files without trying to use complex XPath expressions. Cut the file simply by one of the near root tags - the must be a lot of!
You will see the performance is by far better and you have the option to process these smaller files in parallel!
Creating XML and reading XML are completely different. I explained how you might go about this in my original post and @Jan Lolling has essentially said the same.