Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Bucharest on Sept 18th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
menorah84
Contributor
Contributor

Replacing a tag in a big XML file efficiently

I am using tAdvancedFileOutputXML component to generate an XML from some DB table data.

 

The XML node Addresses encloses a repeated Address entry.

<Addresses>
    <Address>
      ...
    </Address>
    <Address>
      ...
    </Address>
<Addresses>

 

Each row of table data has maximum of two column groups that pertains to the addresses, like this:

0683p000009MZn1.png

 

So, in order for me to avoid doing more complex "pivoting", I just mapped the two column groups this way in tAdvancedFileOutputXML component:

<Addresses>
    <Address>
       [ad1_* columns goes here]
    </Address>
    <Address2>
       [ad2_* columns goes here]
    </Address2>
<Addresses>

So, the output is an XML file that has an inner node that looks like the above.

 

My next step now is to replace the tag Address2 in the file with Address, using tFileInputRaw and tMap components.

 

tFileInputRaw

0683p000009MZwv.png 


tMap

0683p000009MZx0.png

 

My job would look like this:
0683p000009MZx5.png

 

However, when I run this job, I am getting an OutOfMemoryError on tFileInputRaw as the output XML 

from tAdvancedFileOutputXML is pretty big (300MB - 1.5GB).

0683p000009MZxA.png

 

My question is, how do I replace those Address2 tags without getting this error? Do I need to parallelize the replace operation and how?

 

Labels (3)
1 Reply
vapukov
Master II
Master II

Hi, there several possible solutions:

 

  1. Increase Java memory for the Job - Advaced setting on Run Job tab, or - https://community.talend.com/t5/Migration-Configuration-and/OutOfMemory-Exception/ta-p/21669 
  2. because your columns group is simple, change your query from select to 
    SELECT id, ad1_unit as ad_unit, ad1_st_name as ad_st_name, ..
    UNION ALL
    SELECT id, ad2_unit as ad_unit, ad2_st_name as ad_st_name, ..
    and there you can have addresses in the same loop
  3. if 2nd not a solution and you expect file sizes bigger than available memory, use command-line tools like perl or sed (call command with tSystem component)