Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
SunitS
Contributor
Contributor

XML file import error

Hi Team ,

 

I am trying to import xml file having size of 2.0 GB through

talend Version: 7.3.1.

 

Below is sample xml format

<xml>

 <local namespace = "Talend" name =

"Community">

<field name = "FName">Sunit</field>

<field name = "LName">S</field>

<field name = "MobNo">9999888877</field>

<field name = "Area">XYSX</field>

<field name = "State">Maharashtra</field>

<field name = "Pincode">421302</field>

 </local>

</xml>

 

I am using simple tfileinputxml to fetch xml input and tDBOutput

to export to db.

 

Refer image 1

My xpath query and mapping working perfectly fine , with the

input file size upto 200 MB to max 500 MB.

 

Xpath : "/xml/local" and Mapping

: "*[contains(@name,'FName)]" ......

 

and below setting 

Refer Image 2

 

 

Also I configured advanced runtime settings as to handle

huge input.

Refer Image 3

 

 

 Using above config, we are able to load small file only

max 500 MB.

If I tried any file more than that it throws error as

 

OpenJDK 64-Bit Server VM warning: INFO:

os::commit_memory(0x0000000555380000, 3223322624, 0) failed; error='Cannot

allocate memory' (errno=12)

or

 

Exception in thread "main" java.lang.OutOfMemoryError:

GC overhead limit exceeded

or

Exception in thread "main" java.lang.OutOfMemoryError:

Java heap space

 

etc.

 

Currently code is deployed on system having 16 GB RAM and Linux

as OS.

 

Kindly check and assist me to solve this issue asap.

 

Also I would like to use SAX parser ,but it is not working with

current xpath.

Labels (3)
4 Replies
Anonymous
Not applicable

Dom4J won't work well unless you have A LOT of memory. It's great for processing smaller files, but terrible for huge files. Maybe try Xerces....although I suspect that this will also have issues. The best parser to use is SAX....but you have the least control with that. But, a way to consider this is "divide and conquer". Use SAX to split the full XML into loop sections, then process each of those sections independently. Select each loop section as a "NODE" and then process the nodes with another component using Dom4J.

SunitS
Contributor
Contributor
Author

Thanks. But how I will test SAX parser working with my XPath query and mapping.

As whenever I try to create generic format by creating xml through metadata, it uses by default Dom4J parser. So it works over there , but fails with SAX.

Please suggest...

jlolling
Creator III
Creator III

It is a terrible idea to have such large XML files. This is for sure a big design flow. The first thing you MUST do is cutting the huge file into smaller pieces! It does not matter with which tool you are trying to process these large file, it is always a nightmare.

I would try to change the generation process to build more than one file (e.g. 100 instead of one) and if this is not possible I would try to use a SAX parser and cut the large file into smaller files without trying to use complex XPath expressions. Cut the file simply by one of the near root tags - the must be a lot of!

You will see the performance is by far better and you have the option to process these smaller files in parallel!

Anonymous
Not applicable

Creating XML and reading XML are completely different. I explained how you might go about this in my original post and @Jan Lolling​ has essentially said the same.