<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Replacing a tag in a big XML file efficiently in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Replacing-a-tag-in-a-big-XML-file-efficiently/m-p/2217906#M13316</link>
    <description>&lt;P&gt;I am using &lt;STRONG&gt;&lt;EM&gt;tAdvancedFileOutputXML&lt;/EM&gt;&lt;/STRONG&gt; component to generate an XML from some DB table data.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;The XML node &lt;EM&gt;Addresses&lt;/EM&gt; encloses a repeated &lt;EM&gt;Address&lt;/EM&gt; entry.&lt;/P&gt; 
&lt;PRE&gt;&amp;lt;Addresses&amp;gt;
&amp;nbsp; &amp;nbsp; &amp;lt;Address&amp;gt;
      ...
&amp;nbsp; &amp;nbsp; &amp;lt;/Address&amp;gt;
&amp;nbsp; &amp;nbsp; &amp;lt;Address&amp;gt;
      ...
&amp;nbsp; &amp;nbsp; &amp;lt;/Address&amp;gt;
&amp;lt;Addresses&amp;gt;&lt;/PRE&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Each row of table data has maximum of two column groups that pertains to the addresses, like this:&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="addresses.png" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZn1.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/143280i3C1C798A9C70AB1A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZn1.png" alt="0683p000009MZn1.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;So, in order for me to avoid doing more complex "pivoting", I just mapped the two column groups this way in &lt;STRONG&gt;&lt;EM&gt;tAdvancedFileOutputXML&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;component:&lt;/P&gt; 
&lt;PRE&gt;&amp;lt;Addresses&amp;gt;
&amp;nbsp; &amp;nbsp; &amp;lt;Address&amp;gt;
       [ad1_* columns goes here]
&amp;nbsp; &amp;nbsp; &amp;lt;/Address&amp;gt;
&amp;nbsp; &amp;nbsp; &amp;lt;Address2&amp;gt;
       [ad2_* columns goes here]
&amp;nbsp; &amp;nbsp; &amp;lt;/Address2&amp;gt;
&amp;lt;Addresses&amp;gt;&lt;/PRE&gt; 
&lt;P&gt;So, the output is an XML file that has an inner node that looks like the above.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My next step now is to replace the tag &lt;EM&gt;Address2&lt;/EM&gt; in the file with&amp;nbsp;&lt;EM&gt;Address&lt;/EM&gt;, using &lt;STRONG&gt;&lt;EM&gt;tFileInputRaw&lt;/EM&gt;&lt;/STRONG&gt; and &lt;STRONG&gt;&lt;EM&gt;tMap&amp;nbsp;&lt;/EM&gt;&lt;/STRONG&gt;components.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;tFileInputRaw&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="fileinput.png" style="width: 534px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZwv.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154763i7E1EA780F175D172/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZwv.png" alt="0683p000009MZwv.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;BR /&gt;tMap&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="mapping.png" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZx0.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/141566i959FC0438FE44ABC/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZx0.png" alt="0683p000009MZx0.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My job would look like this:&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="design.png" style="width: 530px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZx5.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/151268i15CA7E858C55C621/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZx5.png" alt="0683p000009MZx5.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;However, when I run this job, I am getting an &lt;EM&gt;OutOfMemoryError&amp;nbsp;&lt;/EM&gt;on &lt;STRONG&gt;&lt;EM&gt;tFileInputRaw&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;as the output XML&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;from&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;tAdvancedFileOutputXML&amp;nbsp;&lt;/EM&gt;&lt;/STRONG&gt;is pretty big (300MB - 1.5GB).&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="outofmemory.png" style="width: 666px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZxA.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154693iD7060F20B6C54B0B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZxA.png" alt="0683p000009MZxA.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My question is, how do I replace those&amp;nbsp;&lt;EM&gt;Address2&amp;nbsp;&lt;/EM&gt;tags without getting this error? Do I need to parallelize the replace operation and how?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 16 Nov 2024 02:09:11 GMT</pubDate>
    <dc:creator>menorah84</dc:creator>
    <dc:date>2024-11-16T02:09:11Z</dc:date>
    <item>
      <title>Replacing a tag in a big XML file efficiently</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Replacing-a-tag-in-a-big-XML-file-efficiently/m-p/2217906#M13316</link>
      <description>&lt;P&gt;I am using &lt;STRONG&gt;&lt;EM&gt;tAdvancedFileOutputXML&lt;/EM&gt;&lt;/STRONG&gt; component to generate an XML from some DB table data.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;The XML node &lt;EM&gt;Addresses&lt;/EM&gt; encloses a repeated &lt;EM&gt;Address&lt;/EM&gt; entry.&lt;/P&gt; 
&lt;PRE&gt;&amp;lt;Addresses&amp;gt;
&amp;nbsp; &amp;nbsp; &amp;lt;Address&amp;gt;
      ...
&amp;nbsp; &amp;nbsp; &amp;lt;/Address&amp;gt;
&amp;nbsp; &amp;nbsp; &amp;lt;Address&amp;gt;
      ...
&amp;nbsp; &amp;nbsp; &amp;lt;/Address&amp;gt;
&amp;lt;Addresses&amp;gt;&lt;/PRE&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Each row of table data has maximum of two column groups that pertains to the addresses, like this:&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="addresses.png" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZn1.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/143280i3C1C798A9C70AB1A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZn1.png" alt="0683p000009MZn1.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;So, in order for me to avoid doing more complex "pivoting", I just mapped the two column groups this way in &lt;STRONG&gt;&lt;EM&gt;tAdvancedFileOutputXML&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;component:&lt;/P&gt; 
&lt;PRE&gt;&amp;lt;Addresses&amp;gt;
&amp;nbsp; &amp;nbsp; &amp;lt;Address&amp;gt;
       [ad1_* columns goes here]
&amp;nbsp; &amp;nbsp; &amp;lt;/Address&amp;gt;
&amp;nbsp; &amp;nbsp; &amp;lt;Address2&amp;gt;
       [ad2_* columns goes here]
&amp;nbsp; &amp;nbsp; &amp;lt;/Address2&amp;gt;
&amp;lt;Addresses&amp;gt;&lt;/PRE&gt; 
&lt;P&gt;So, the output is an XML file that has an inner node that looks like the above.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My next step now is to replace the tag &lt;EM&gt;Address2&lt;/EM&gt; in the file with&amp;nbsp;&lt;EM&gt;Address&lt;/EM&gt;, using &lt;STRONG&gt;&lt;EM&gt;tFileInputRaw&lt;/EM&gt;&lt;/STRONG&gt; and &lt;STRONG&gt;&lt;EM&gt;tMap&amp;nbsp;&lt;/EM&gt;&lt;/STRONG&gt;components.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;tFileInputRaw&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="fileinput.png" style="width: 534px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZwv.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154763i7E1EA780F175D172/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZwv.png" alt="0683p000009MZwv.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;BR /&gt;tMap&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="mapping.png" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZx0.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/141566i959FC0438FE44ABC/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZx0.png" alt="0683p000009MZx0.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My job would look like this:&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="design.png" style="width: 530px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZx5.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/151268i15CA7E858C55C621/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZx5.png" alt="0683p000009MZx5.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;However, when I run this job, I am getting an &lt;EM&gt;OutOfMemoryError&amp;nbsp;&lt;/EM&gt;on &lt;STRONG&gt;&lt;EM&gt;tFileInputRaw&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;as the output XML&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;from&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;tAdvancedFileOutputXML&amp;nbsp;&lt;/EM&gt;&lt;/STRONG&gt;is pretty big (300MB - 1.5GB).&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="outofmemory.png" style="width: 666px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MZxA.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154693iD7060F20B6C54B0B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MZxA.png" alt="0683p000009MZxA.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My question is, how do I replace those&amp;nbsp;&lt;EM&gt;Address2&amp;nbsp;&lt;/EM&gt;tags without getting this error? Do I need to parallelize the replace operation and how?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 02:09:11 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Replacing-a-tag-in-a-big-XML-file-efficiently/m-p/2217906#M13316</guid>
      <dc:creator>menorah84</dc:creator>
      <dc:date>2024-11-16T02:09:11Z</dc:date>
    </item>
    <item>
      <title>Re: Replacing a tag in a big XML file efficiently</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Replacing-a-tag-in-a-big-XML-file-efficiently/m-p/2217907#M13317</link>
      <description>&lt;P&gt;Hi, there several possible solutions:&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;OL&gt; 
 &lt;LI&gt;Increase Java memory for the Job - Advaced setting on Run Job tab, or -&amp;nbsp;&lt;A href="https://community.qlik.com/s/article/ka03p0000006EZuAAM" target="_self"&gt;https://community.talend.com/t5/Migration-Configuration-and/OutOfMemory-Exception/ta-p/21669&lt;/A&gt;&amp;nbsp;&lt;/LI&gt; 
 &lt;LI&gt;because your columns group is simple, change your query from select to&amp;nbsp;&lt;BR /&gt;&lt;PRE&gt;SELECT id, ad1_unit as ad_unit, ad1_st_name as ad_st_name, ..
UNION ALL
SELECT id, ad2_unit as ad_unit, ad2_st_name as ad_st_name, ..&lt;/PRE&gt;and there you can have addresses in the same loop&lt;/LI&gt; 
 &lt;LI&gt;if 2nd not a solution and you expect file sizes bigger than available memory, use command-line tools like perl or sed (call command with tSystem component)&lt;/LI&gt; 
&lt;/OL&gt;</description>
      <pubDate>Mon, 22 Jun 2020 12:30:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Replacing-a-tag-in-a-big-XML-file-efficiently/m-p/2217907#M13317</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2020-06-22T12:30:40Z</dc:date>
    </item>
  </channel>
</rss>

