Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Read huge xml

Hi,
I have a huge xml that I want to read. As it is an SDMX file, I wanted to imported as it is because I don't know how to specify it in the metadata otherwise. Obviously, it didn't work very well. As the file is more than 4 Gb, it crashes TOS. What would you have done in this case? Is any example to specify SDMXs in the metadata (xml files)?
Thanks in advance
Labels (3)
7 Replies
vapukov
Master II
Master II

what You mean - "I wanted imported as it is"? import to where? how?
Anonymous
Not applicable
Author

You could try the tFileInputXML and select SAX parsing on the advanced settings. SAX is much quicker than DOM and doesn't need to load it into memory, but you will not be able to use look ahead or look back xpath functions.
Anonymous
Not applicable
Author

what You mean - "I wanted imported as it is"? import to where? how?

Hi Vapukov,
I wanted to create the xml. I used a sample xml that contained only 1 row in the loop, at the end.
I have tried everything, but nothing works, not even SAX, so I don't know what is the approach I could use in this case...
Anonymous
Not applicable
Author

You could try the tFileInputXML and select SAX parsing on the advanced settings. SAX is much quicker than DOM and doesn't need to load it into memory, but you will not be able to use look ahead or look back xpath functions.

Hi rhall,
I did try with tFileInputXML, selecting SAX in the advanced settings. The output is a tfileoutputdelimited that I split each 1000 lines.
Nothing happens, it gets stuck in "Starting".
What would you recommend?
Thanks in advance
vapukov
Master II
Master II

what You mean - "I wanted imported as it is"? import to where? how?

Hi Vapukov,
I wanted to create the xml. I used a sample xml that contained only 1 row in the loop, at the end.
I have tried everything, but nothing works, not even SAX, so I don't know what is the approach I could use in this case...
Sorry, hard to understand - what You try to achieve?
in one post You tell, You are want to write XML file, in next You write csv file from XML
So, what is the global task? What steps? may be some pictures from Studio and etc
What structure of Your XML file? as it huge - why not try to split it for several files?
Anonymous
Not applicable
Author

Hi Vapukov,
My main issue is to read the huge xml. Even if I want to split it, Talend will have to read it first. This step is the bottleneck. I have tried to change the .ini file to increase the java arguments with -Xms1024m and -Xmx9208m. As well, I have tried to increase the jvm settings of the job runner using specific JVM arguments (-Xms1024m and -Xmx9208m). I have tried with Talend Open Studio 5.6.2 MDM edition and 6.3.0 BigData edition  The computer I use has an SSD hard disk and a total RAM of 16Gb. After 6 hours of running the job, it is still in the status "Starting". The CPU usage is 100%. The memory usage is 14.6Gb.
It is important to mention that I use the generation mode "fast, with low memory consumption SAX".
This is the xml structure that I have use to create the structure in the metadata:
<?xml version='1.0' encoding='UTF-8'?> <m:GenericData xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer"

To see the whole post, download it here
OriginalPost.pdf
vapukov
Master II
Master II

Hi!
when  wrote "Split" I mean real split using one of the command line utilities, such as:

http://xponentsoftware.com/xmlSplit.aspx
https://github.com/acfr/comma/wiki/XML-Utilities
https://gist.github.com/benallard/8042835

then process folder with all XML one-by-one, Talend it excellent tools, but it not mean we must trust only for single tools, it never will do all what all users want.