Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
sivaa_m
Contributor
Contributor

Shredding XML file

Hi,

I went through various threads in the community and I couldn't find any relevant link or solution.

We have XML of multiple structure. We want to build generic job which will take the different XML as input and provide us with id, parentid, depth, name, value and xpath values from the input XML.

Is it possible to get these values using Talend?

 

Labels (3)
6 Replies
Jesperrekuh
Creator III
Creator III

tExtractXMLFields, you can use xpath-query, however there might be some limitations regarding looping say one file contains orders and another customers. By design I would NEVER implement it in one generic job. Keep your xpath and 'generic job design' limited to specific files (and structures).

However you could create a job which read a folder with 10 different xml files and based on IF filename.equals, set a context var which is the name of the job you need to trigger to read this file.
Take a look at tRunJob and the checkbox dynamic where you can select multiple jobs and trigger it dynamically.

sivaa_m
Contributor
Contributor
Author

Hi,

Thanks for the reply.

We have around 100+ xml files with different structure, we want to avoid creating 100+ jobs to handle each of the XML file. So we are looking to design generic job which can handle all the file in loop and provide us the shredded XML value.

Also the XML are stored in oracle table in clob datatype.

We are looking for feasibility of above solution using talend.

 

Jesperrekuh
Creator III
Creator III

Why not 'prepare' your input data when querying the database... to have some indicators/paremeters which you van use in your generic job (if needed). Oracle has some nice build in functionality to handle XML.

So I dont get the whole generic thing... because in one way or another you need output...
so xpath //customer/id -> output customerId
if //id is the generic thing you still need to put it in -> output id ... but you're loosing context value of what id it is when there are multiple filetypes who contain an element id.
if you want to redirect specific output, generic, say customer data into a customer.xml file and orders in a order file, you need to know the xml structure from the source.

You could iterate over a list of xpath queries and based on this query do your thing you need to do.
If you have 100 files with different structure but you still want extract say a customer name... its just xpath.

In terms of a generic job... please elaborate... but even for generic processing you need some constant variables to direct processes
sivaa_m
Contributor
Contributor
Author

In our current application we are using Oracle XML functions to shred the XML files dynamically, now we are planning to migrating from Oracle platform to Big Data platform and we are planning to use Talend for the data ingestion.

So instead of using the Oracle XML functions again, we are exploring options to implement the same functionality what Oracle XML function provides using Talend components. 

 

Anonymous
Not applicable

OK, here is some code to do this. It is a combination of XSLT and Java. I use it for something I was working on which sounds very similar to what you need. I hope it helps....

This is a routine used to enable you to run XSLT against XML in memory and output the result to a String. The "inData" is your XML as a String. The "xslFileData" is your XSLT.

    public static String process(String inData, String xslFileData) throws FileNotFoundException {
        String returnString = "";
    	
    	try {
            // Create transformer factory
            TransformerFactory factory = TransformerFactory.newInstance();

            // Use the factory to create a template containing the xsl file
            Templates template = factory.newTemplates(new StreamSource(
                new StringReader(xslFileData)));

            // Use the template to create a transformer
            Transformer xformer = template.newTransformer();

            // Prepare the input and output files
            Source source = new StreamSource(new StringReader(inData));
            StringWriter outWriter = new StringWriter();
            Result result = new StreamResult(outWriter);

            // Apply the xsl file to the source file and write the result
            // to the output file
            xformer.transform(source, result);
            
            StringBuffer sb = outWriter.getBuffer();
            
            returnString = sb.toString();
        } catch (TransformerConfigurationException e) {
            // An error occurred in the XSL file
        } catch (TransformerException e) {
            // An error occurred while applying the XSL file
            // Get location of error in input file
            SourceLocator locator = e.getLocator();
            int col = locator.getColumnNumber();
            int line = locator.getLineNumber();
            String publicId = locator.getPublicId();
            String systemId = locator.getSystemId();
        }
    	
    	return returnString;
    }

You will need the following imports for the above Java....

import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

The XSLT you can use (you may wish to tweak this) is below (I borrowed this XSLT from here )....

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="vApos">'</xsl:variable>

    <xsl:template match="*[@* or not(*)] ">
      <xsl:if test="not(*)">
         <xsl:apply-templates select="ancestor-or-self::*" mode="path"/>
         <xsl:value-of select="concat('=',$vApos,.,$vApos)"/>
         <xsl:text>&#xA;</xsl:text>
        </xsl:if>
        <xsl:apply-templates select="@*|*"/>
    </xsl:template>

    <xsl:template match="*" mode="path">
        <xsl:value-of select="concat('/',name())"/>
        <xsl:variable name="vnumPrecSiblings" select=
         "count(preceding-sibling::*[name()=name(current())])"/>
        <xsl:if test="$vnumPrecSiblings">
            <xsl:value-of select="concat('[', $vnumPrecSiblings +1, ']')"/>
        </xsl:if>
    </xsl:template>

    <xsl:template match="@*">
        <xsl:apply-templates select="../ancestor-or-self::*" mode="path"/>
        <xsl:value-of select="concat('[@',name(), '=',$vApos,.,$vApos,']')"/>
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>
</xsl:stylesheet>

If you create a job that reads in your XML and the above XSLT as Strings and passes those Strings to a routine with the method I have given you, it will return the majority of what you require.

 

 

 

 

sivaa_m
Contributor
Contributor
Author

Hi,

Sorry for delayed response.

Thanks for the code, but I couldn't make it work as I am not Java guy.0683p000009MACn.png. I am from ETL and SQL background.

 

@ rhall_2_0, I have prepared the XSLT for my XML structure, but could identify how to use the XSLT in the above given code. Is it possible to provide the full code that need to be used.

Thanks in advance.