<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sax and Xpath Expressions in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202680#M4282</link>
    <description>Hi 
&lt;BR /&gt;I have passed your issue along to the Dev team. But 
&lt;A href="https://help.talend.com/search/all?query=tFileInputXML&amp;amp;content-lang=en" target="_blank" rel="nofollow noopener noreferrer"&gt;Documentation&lt;/A&gt; says there is a limitation on SAX generation mode with the "Get Nodes" option as this mode doesn't support namespaces. Not sure whether this is related or not.</description>
    <pubDate>Fri, 05 Apr 2013 16:12:33 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2013-04-05T16:12:33Z</dc:date>
    <item>
      <title>Sax and Xpath Expressions</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202677#M4279</link>
      <description>I am extracting fields from huge xml files (1.5GB) using tFileInputXML component. In order to parse such a huge file, we had to set the xml generation mode to SAX. DOM4j mode allows for a maximum of 1500MB heap size and this crashes our job due to insufficient heap memory. 
&lt;BR /&gt;The problem with SAX mode is that it does not seem to recognize our xpath queries. Example xml input segment; 
&lt;BR /&gt;&amp;lt;analysis_result analysis="peptideprophet"&amp;gt; 
&lt;BR /&gt;&amp;lt;peptideprophet_result probability="0.3920" all_ntt_prob="(0.0000,0.0000,0.3920)"&amp;gt; 
&lt;BR /&gt;&amp;lt;search_score_summary&amp;gt; 
&lt;BR /&gt;&amp;lt;parameter name="fval" value="0.1900"/&amp;gt; 
&lt;BR /&gt;&amp;lt;parameter name="ntt" value="2"/&amp;gt; 
&lt;BR /&gt;&amp;lt;parameter name="nmc" value="0"/&amp;gt; 
&lt;BR /&gt;&amp;lt;parameter name="massd" value="-0.242"/&amp;gt; 
&lt;BR /&gt;&amp;lt;/search_score_summary&amp;gt; 
&lt;BR /&gt;&amp;lt;/peptideprophet_result&amp;gt; 
&lt;BR /&gt;&amp;lt;/analysis_result&amp;gt; 
&lt;BR /&gt;To extract the nmc parameter value, we were previously using the xpath query; 
&lt;BR /&gt;search_score_summary/parameter/@value 
&lt;BR /&gt;This works in DOM4J mode. SAX mode returns null values. 
&lt;BR /&gt;QUESTIONS: 
&lt;BR /&gt;1. Is there another method of extracting data from huge xml files in Talend other than tFileInputXML in SAX mode? 
&lt;BR /&gt;2. How can we get the values for each of such separate parameters? 
&lt;BR /&gt;Any suggestions pointing me in the right direction are very much welcome. Thank You.</description>
      <pubDate>Sat, 16 Nov 2024 13:09:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202677#M4279</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T13:09:40Z</dc:date>
    </item>
    <item>
      <title>Re: Sax and Xpath Expressions</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202678#M4280</link>
      <description>this is an excellent question, we are having similar issues. Can someone from Talend please shed some light &lt;BR /&gt;thanks</description>
      <pubDate>Wed, 28 Sep 2011 22:10:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202678#M4280</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2011-09-28T22:10:32Z</dc:date>
    </item>
    <item>
      <title>Re: Sax and Xpath Expressions</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202679#M4281</link>
      <description>I also have the same problem... Any feedback received ? I'm stuck in my development ....</description>
      <pubDate>Fri, 05 Apr 2013 14:11:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202679#M4281</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-04-05T14:11:56Z</dc:date>
    </item>
    <item>
      <title>Re: Sax and Xpath Expressions</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202680#M4282</link>
      <description>Hi 
&lt;BR /&gt;I have passed your issue along to the Dev team. But 
&lt;A href="https://help.talend.com/search/all?query=tFileInputXML&amp;amp;content-lang=en" target="_blank" rel="nofollow noopener noreferrer"&gt;Documentation&lt;/A&gt; says there is a limitation on SAX generation mode with the "Get Nodes" option as this mode doesn't support namespaces. Not sure whether this is related or not.</description>
      <pubDate>Fri, 05 Apr 2013 16:12:33 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202680#M4282</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-04-05T16:12:33Z</dc:date>
    </item>
    <item>
      <title>Re: Sax and Xpath Expressions</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202681#M4283</link>
      <description>Thanks for your feedback... &lt;BR /&gt;If required, I can send you some data to help the developers to troubleshoot this issue... (screen shot, XML file, XPATH queries, ...).&lt;BR /&gt;For info : &lt;BR /&gt;- I use the component tFileInputMSXML, and "Enable XPATH is column 'Schema XPATH loop'.... ' is not ticked. To be honest, I don't see any difference whenever it's ticked or not.. Strange, because I use the "Schema XPATH loop" column....&lt;BR /&gt;- If tFileInputMSXSXML cannot be used to stream huge XML files using "Schema XPATH loop" column, what's the alternative for such file ? I don't want to read the file several times.....</description>
      <pubDate>Wed, 10 Apr 2013 07:48:39 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202681#M4283</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-04-10T07:48:39Z</dc:date>
    </item>
    <item>
      <title>Re: Sax and Xpath Expressions</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202682#M4284</link>
      <description>I also find the same thing. I think that xpath expressions aren't allowed in "Xpath Query". This is documented for tFileInputMSXML, but not for tFileInputXML. 
&lt;BR /&gt;On the other hand, I saw this: 
&lt;A href="https://jira.talendforge.org/browse/TDI-547" rel="nofollow noopener noreferrer"&gt;https://jira.talendforge.org/browse/TDI-547&lt;/A&gt; 
&lt;BR /&gt;That sounded like this feature had been added in 2007. 
&lt;BR /&gt;I have a job that runs fine, gets expected results with an xpath like "/a/b", when I use DOM mode, but if I change to SAX, it doesn't find any results, and doesn 't give any error. 
&lt;BR /&gt;Is the component supposed to work if you set it to use SAX mode and have an xpath query? If not, I'd suggest that Talend clarify that in the UI. 
&lt;BR /&gt;Levin</description>
      <pubDate>Thu, 09 May 2013 16:18:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202682#M4284</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-05-09T16:18:19Z</dc:date>
    </item>
    <item>
      <title>Re: Sax and Xpath Expressions</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202683#M4285</link>
      <description>one way to solve this is to split the large file into smaller ones (there will be up to 200~290 file) not exeeding 6mb using tjava to call a routine you create (i found this 6mb size have the fastest parsing time) and then do a tfilelist to iterate on them using DOM4j wich is faster than sax and is better with Xpath query's 
&lt;BR /&gt;see 
&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCq6pCAC" rel="nofollow noopener noreferrer"&gt;https://community.talend.com/t5/Design-and-Development/OutOfMemoryError-GC-overhead-limit-exceeded-on-large-XML-files/td-p/106620&lt;/A&gt;</description>
      <pubDate>Thu, 09 May 2013 16:50:30 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202683#M4285</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-05-09T16:50:30Z</dc:date>
    </item>
    <item>
      <title>Re: Sax and Xpath Expressions</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202684#M4286</link>
      <description>Thanks asouini, I will consider that alternative. I wonder if an xslt transform could be used instead of the java code. 
&lt;BR /&gt;Can anyone confirm that Talend's designers don't intend to support Xpath expressions when using the SAX model? 
&lt;BR /&gt;Thanks</description>
      <pubDate>Fri, 10 May 2013 19:53:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sax-and-Xpath-Expressions/m-p/2202684#M4286</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-05-10T19:53:34Z</dc:date>
    </item>
  </channel>
</rss>

