Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] Split a large XML file into small files with talend

Hi,
I am trying to integrate data from a large XML file (300 Mo). Is there a way to do it with talend ?
Labels (3)
1 Solution

Accepted Solutions
willm1
Creator
Creator

Seiif - Can you do a simple job where you use a tFileInputFullRow to read the XML file and spit out to a tLogRow? If that works - which means your job will run, you can parse it using the 'cruder' Talend-specific solution that I mentioned above.
Let me know if you can do this...

View solution in original post

12 Replies
Anonymous
Not applicable
Author

What is the problem that you are facing in doing this?
Vaibhav
Anonymous
Not applicable
Author

The problem is that I can't load the XML File (300Mo ) to the medatadata XML.
Every time I try to do this talend craches
Anonymous
Not applicable
Author

I have done this using the Perl library TWIG and just used a tSystem to call perl/twig and split the XML.
Anonymous
Not applicable
Author

Jholman , Coud you give me more details about this please
willm1
Creator
Creator

Hi Seiif - Before suggesting alternatives (below), have you changed your XML parser to SAX in tFileInput, increased your heap size for the job and tried it? DOM parser is very memory intensive whereas SAX is not...

Like jholman, I've done this using sed utility in a shell script (.sh) on the filesystem, called from a tSystem. Using sed, I looked for a particular tag (open tag for the XML), and wherever I found it, I extracted the text between.
Another cruder method I did recently was reading the file as plain text (tFullRow), looking for these markers in the XML, marking them with an increment counter (sequence), and then split the file using tMap. This was for queue data that needed to be processed for each 'row'. 

Anonymous
Not applicable
Author

Hi Willm, I have chcnaged my XML parser to SAX in tFileInput , and I incresased the heap size for the job , but I still have the same problem.
thanks for your precious suggestion
willm1
Creator
Creator

Seiif - Can you do a simple job where you use a tFileInputFullRow to read the XML file and spit out to a tLogRow? If that works - which means your job will run, you can parse it using the 'cruder' Talend-specific solution that I mentioned above.
Let me know if you can do this...
Anonymous
Not applicable
Author

It works with tFileInputFullRow. I will try the cruder and tell you about the results. Thanks Willm
0683p000009MECs.jpg
Anonymous
Not applicable
Author

Please see the relevant documentation for Twig here : http://search.cpan.org/dist/XML-Twig/tools/xml_split/xml_split
It also provides a mechanism for merging them back together again.