<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: TFileInputXML or TFileInputMSXML with a large complex xml file in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/TFileInputXML-or-TFileInputMSXML-with-a-large-complex-xml-file/m-p/2199800#M2562</link>
    <description>Hello, 
&lt;BR /&gt;Did you resolve the issue? 
&lt;BR /&gt;I am also looking at huge files (&amp;gt;500 mb) file to be processed. 
&lt;BR /&gt;Thanks, 
&lt;BR /&gt;Sairam</description>
    <pubDate>Tue, 20 Aug 2013 18:08:16 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2013-08-20T18:08:16Z</dc:date>
    <item>
      <title>TFileInputXML or TFileInputMSXML with a large complex xml file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/TFileInputXML-or-TFileInputMSXML-with-a-large-complex-xml-file/m-p/2199799#M2561</link>
      <description>Hello, 
&lt;BR /&gt;I'm currently working on the extraction of data from a large xml file (~600Mb) with a complex (and recursive) structure. The dtd of the xml is the following: 
&lt;BR /&gt; &amp;lt;!ELEMENT address ( city | country | province | street | zipcode )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT africa ( item+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT age ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT annotation ( author, description, happiness ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT asia ( item+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT australia ( item+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT author EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST author person NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT bidder ( date, time, personref, increase ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT bold ( #PCDATA | emph | keyword )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT business ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT buyer EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST buyer person NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT categories ( category+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT category ( name, description ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST category id ID #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT catgraph ( edge+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT city ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT closed_auction ( seller, buyer, itemref, price, date, quantity, type, annotation ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT closed_auctions ( closed_auction+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT country ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT creditcard ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT current ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT date ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT description ( parlist | text )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT edge EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST edge from NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST edge to NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT education ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT emailaddress ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT emph ( #PCDATA | bold | keyword )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT end ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT europe ( item+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT from ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT gender ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT happiness ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT homepage ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT incategory EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST incategory category NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT increase ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT initial ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT interest EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST interest category NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT interval ( start, end ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT item ( location, quantity, name, payment, description, shipping, incategory+, mailbox ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST item featured ( yes ) #IMPLIED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST item id ID #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT itemref EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST itemref item ID #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT keyword ( #PCDATA | bold | emph )* &amp;gt; 
&lt;BR /&gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT listitem ( parlist | text )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT location ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT mail ( from, to, date, text ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT mailbox ( mail* ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT name ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT namerica ( item+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT open_auction ( annotation | bidder | current | initial | interval | itemref | privacy | quantity | reserve | seller | type )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST open_auction id ID #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT open_auctions ( open_auction+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT parlist ( listitem+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT payment ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT people ( person+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT person ( address | creditcard | emailaddress | homepage | name | phone | profile | watches )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST person id ID #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT personref EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST personref person NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT phone ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT price ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT privacy ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT profile ( age | business | education | gender | interest )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST profile income NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT province ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT quantity ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT regions ( africa, asia, australia, europe, namerica, samerica ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT reserve ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT samerica ( item+ ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT seller EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST seller person NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT shipping ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT site ( regions, categories, catgraph, people, open_auctions, closed_auctions ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT start ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT street ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT text ( #PCDATA | bold | emph | keyword )* &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT time ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT to ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT type ( #PCDATA ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT watch EMPTY &amp;gt; 
&lt;BR /&gt; &amp;lt;!ATTLIST watch open_auction NMTOKEN #REQUIRED &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT watches ( watch* ) &amp;gt; 
&lt;BR /&gt; &amp;lt;!ELEMENT zipcode ( #PCDATA ) &amp;gt; 
&lt;BR /&gt;My problem is when I want use an xml file metadata for my xml file, a java heap space error is generated during the creation of the "schema viewer". Nevertheless, I try to use tFileInputXML or TFileInputMSXML components with the SAX generator and, it works for simple structure but not recursive one. 
&lt;BR /&gt;Do you know if it exists a way to extract data from such a xml in a different and simplier way with Talend than all extract with Xpath Query? 
&lt;BR /&gt;Thank you.</description>
      <pubDate>Fri, 14 Dec 2012 08:23:03 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/TFileInputXML-or-TFileInputMSXML-with-a-large-complex-xml-file/m-p/2199799#M2561</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-14T08:23:03Z</dc:date>
    </item>
    <item>
      <title>Re: TFileInputXML or TFileInputMSXML with a large complex xml file</title>
      <link>https://community.qlik.com/t5/Talend-Studio/TFileInputXML-or-TFileInputMSXML-with-a-large-complex-xml-file/m-p/2199800#M2562</link>
      <description>Hello, 
&lt;BR /&gt;Did you resolve the issue? 
&lt;BR /&gt;I am also looking at huge files (&amp;gt;500 mb) file to be processed. 
&lt;BR /&gt;Thanks, 
&lt;BR /&gt;Sairam</description>
      <pubDate>Tue, 20 Aug 2013 18:08:16 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/TFileInputXML-or-TFileInputMSXML-with-a-large-complex-xml-file/m-p/2199800#M2562</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-08-20T18:08:16Z</dc:date>
    </item>
  </channel>
</rss>

