<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: XML parsing with unknown structure in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325689#M95263</link>
    <description>What you need is something like a normalizer for XML which returns all tags with values and the path to the tag. Which such kind of component (it should use a SAX parser without memorizing the DOM) you should be able to retrieve everything - even unknown structures. I will take a look if I can create such component.</description>
    <pubDate>Mon, 15 Jul 2013 12:22:40 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2013-07-15T12:22:40Z</dc:date>
    <item>
      <title>XML parsing with unknown structure</title>
      <link>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325688#M95262</link>
      <description>Please have a look at the &lt;A href="https://dl.dropboxusercontent.com/u/1757832/exampleXML.xml" target="_blank" rel="nofollow noopener noreferrer"&gt;XML file&lt;/A&gt; structure below. We are looking for a way to parse this document without the need to specify the exact structure of the document. Is there some kind of way within the standard components from Talend where we can parse this document?&lt;BR /&gt;Basically what I would like to do is the following:&lt;BR /&gt;# Start parsing the document.&lt;BR /&gt;# Loop through the &amp;lt;products&amp;gt;&lt;BR /&gt;# For each element inside a product, find the next (new) element and save corresponding value and parent ID, etcetera.&lt;BR /&gt;EXAMPLE XML STRUCTURE&lt;BR /&gt;&lt;PRE&gt;&amp;lt;?xml version="1.0" encoding="utf-8"?&amp;gt;&lt;BR /&gt;&amp;lt;ONIXmessage&amp;gt;&lt;BR /&gt;  &amp;lt;product&amp;gt;&lt;BR /&gt;    &amp;lt;a001&amp;gt;A14528039&amp;lt;/a001&amp;gt;&lt;BR /&gt;    &amp;lt;a002&amp;gt;01&amp;lt;/a002&amp;gt;&lt;BR /&gt;    &amp;lt;productidentifier&amp;gt;&lt;BR /&gt;      &amp;lt;b221&amp;gt;02&amp;lt;/b221&amp;gt;&lt;BR /&gt;      &amp;lt;b244&amp;gt;3790827584&amp;lt;/b244&amp;gt;&lt;BR /&gt;    &amp;lt;/productidentifier&amp;gt;&lt;BR /&gt;    &amp;lt;productidentifier&amp;gt;&lt;BR /&gt;      &amp;lt;b221&amp;gt;03&amp;lt;/b221&amp;gt;&lt;BR /&gt;      &amp;lt;b244&amp;gt;9783191072551&amp;lt;/b244&amp;gt;&lt;BR /&gt;    &amp;lt;/productidentifier&amp;gt;&lt;BR /&gt;    &amp;lt;b246&amp;gt;01&amp;lt;/b246&amp;gt;&lt;BR /&gt;    &amp;lt;b012&amp;gt;BB&amp;lt;/b012&amp;gt;&lt;BR /&gt;    &amp;lt;series&amp;gt;&lt;BR /&gt;      &amp;lt;seriesidentifier&amp;gt;&lt;BR /&gt;        &amp;lt;b273&amp;gt;01&amp;lt;/b273&amp;gt;&lt;BR /&gt;        &amp;lt;b233&amp;gt;Set-ID&amp;lt;/b233&amp;gt;&lt;BR /&gt;        &amp;lt;b244&amp;gt;C181&amp;lt;/b244&amp;gt;&lt;BR /&gt;      &amp;lt;/seriesidentifier&amp;gt;&lt;BR /&gt;      &amp;lt;b018&amp;gt;Contributions to Management Science&amp;lt;/b018&amp;gt;&lt;BR /&gt;      &amp;lt;b019&amp;gt;1386&amp;lt;/b019&amp;gt;&lt;BR /&gt;      &amp;lt;b020&amp;gt;1236&amp;lt;/b020&amp;gt;&lt;BR /&gt;    &amp;lt;/series&amp;gt;&lt;BR /&gt;  &amp;lt;/product&amp;gt;&lt;BR /&gt;&amp;lt;/ONIXmessage&amp;gt;&lt;/PRE&gt;&lt;BR /&gt;It would be great if we have a way to parse this and get the following output.&lt;BR /&gt;EXAMPLE OUTPUT&lt;BR /&gt;&lt;PRE&gt;ID   Parent  Key                Value&lt;BR /&gt;1            A001               A14528039&lt;BR /&gt;2            A002               01&lt;BR /&gt;3            Productidentifier&lt;BR /&gt;4    3       b221               02&lt;BR /&gt;5    3       b244               3790827584&lt;BR /&gt;6            Productidentifier&lt;BR /&gt;7    6       b221               03&lt;BR /&gt;8    6       b244               9783191072551&lt;BR /&gt;9            b246               01&lt;BR /&gt;10           b012               BB&lt;BR /&gt;11           Series&lt;BR /&gt;12   11      seriesidentifier&lt;BR /&gt;13   12      b273               01&lt;BR /&gt;14   12      b233               Set-ID&lt;BR /&gt;15   12      b244               C181&lt;BR /&gt;16   11      b018               Contributions to Management Science&lt;BR /&gt;17   11      b019               1386&lt;BR /&gt;18   11      b020               1236&lt;/PRE&gt;&lt;BR /&gt;In other words: we don't now what elements we can expect in the XML structure. The component just should create a table containing (sub)elements and their corresponding key. I think a lot of people want a component like this and and my opinion I think it is very strange that an ETL tool like Talend does not have this.</description>
      <pubDate>Mon, 15 Jul 2013 11:54:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325688#M95262</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-07-15T11:54:32Z</dc:date>
    </item>
    <item>
      <title>Re: XML parsing with unknown structure</title>
      <link>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325689#M95263</link>
      <description>What you need is something like a normalizer for XML which returns all tags with values and the path to the tag. Which such kind of component (it should use a SAX parser without memorizing the DOM) you should be able to retrieve everything - even unknown structures. I will take a look if I can create such component.</description>
      <pubDate>Mon, 15 Jul 2013 12:22:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325689#M95263</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-07-15T12:22:40Z</dc:date>
    </item>
    <item>
      <title>Re: XML parsing with unknown structure</title>
      <link>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325690#M95264</link>
      <description>We have already created some java which does the parsing. The problem is that we want to wrap the java code into a Talend component. It would be great if the component has an input parameter (the xml file to be parsed) and as an output a row containing key/value. How do we create such a component?</description>
      <pubDate>Tue, 16 Jul 2013 06:18:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325690#M95264</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-07-16T06:18:40Z</dc:date>
    </item>
    <item>
      <title>Re: XML parsing with unknown structure</title>
      <link>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325691#M95265</link>
      <description>Hi jlolling, can you please provide an update? What are the steps to take to convert our java code into a component?</description>
      <pubDate>Mon, 05 Aug 2013 08:36:20 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/XML-parsing-with-unknown-structure/m-p/2325691#M95265</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2013-08-05T08:36:20Z</dc:date>
    </item>
  </channel>
</rss>

