Re: Reading complex xml in Hadoop HDFS - Qlik Community

_AnonymousUser · ‎2014-03-25

Hi,
Has any read/parsed deeply nested xml that is stored in Hortonworks HDFS file system using Talend. Our requirement is that we have deeply nested raw xml file already landed in HDFS by a different process. We need to read this with Talend and further process it. The volume of this file is very high. There are 2000+ 50 MB files.
I heard in 5.4.1 we could generate native map reduce(not pig and hive code) inside Hadoop. Please share us your experiences if someone has worked on this type of problem.
Thanks
Subra

Anonymous · ‎2014-03-26

Hi,
From your description, you can use tHDFSConnection-->onsubjobOk-->tHDFSList-->tHDFSget to get your files into local machine disk from HDFS then make further process.
For your high volume files, tHDFSList can retrieve a list of files or folders based on a filemask pattern and iterates on each unity.
Best regards
Sabrina

Anonymous · ‎2014-03-26

Hi,
The requirement is that to parse the file in hadoop itself using talend MR capabilities without pulling the file in to local.We use 5.3.1 and sooner will move to 5.4.1 version.Please let us know if there is any such feature that we can use.
Thanks,
Swami.

Anonymous · ‎2014-03-27

Hi,
So far, talend open studio for big data cannot achieve your goal.
Talend Enterprise Subscription Version can meet your needs. Feel free contact us.
Best regards

Anonymous · ‎2014-03-27

Hi,
We are trying to achieve this with Talend Enterprise version for Big Data.We are trying to parse a complex XML with multiple tags nested.Please do let us know on how this can be done with TIS 5.3.1 Map Reduce as it is one of our requirements and if Talend can help us in leveraging Hadoop MR to parse the XML this will make our job simpler.
Thanks,
Swami.

Anonymous · ‎2014-03-27

Hi,
With Enterprise Subscription Version Product, please open a jira issue of DI project on Talend Bug Tracker for Map Reduce XML components, our developer will custom one for you.
Best regards
Sabrina

_AnonymousUser · ‎2014-07-24

We are using talend enterprise Big Data 5.5 version and are facing similar issue. We cannot write a MapReduce job in Talend BD to parse the XML data in HDFS

. Is there a way to do this using custom code?

Reading complex xml in Hadoop HDFS

Big Data

v5.x

XML