Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

XPath query not working in tFileInputXML and other XML components

Hello Talend Team,
First, thanks for your efforts on Talend Studio development!
The concept and most components are great and well thought.
As with any software product, there is of course room for improvement and I'm sure you are aiming it.
So, if you want, I may share my impressions and the issues I've met during my work with Talend.
1. XPath
The biggest issue I've met is the lack of XPath in XML related components (tFileInputXML, tXMLmap, etc.).
I know tFileInputXML has "XPath query" where you can enter XPath, but it does not work when SAX parser is chosen, 
(which will be the case for real world usage where you have big documents and loading/parsing them in memory is just not possible).
I also know XPath requires does not go native with SAX, but there an easy and elegant solution to that (please read bellow).
Here is an basic example illustrating the problem.
Imaging you have very simple XML:
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product>
<ID>1</ID>
<name>product 1</name>
<attribute id="color">red</attribute>
<attribute id="size">S</attribute>
</product>
<product>
<ID>2</ID>
<name>product 2</name>
<attribute id="color">green</attribute>
<attribute id="size">M</attribute>
</product>
<product>
<ID>3</ID>
<name>product 3</name>
<attribute id="color">blue</attribute>
<attribute id="size">L</attribute>
</product>
</products>

You'd likely want to extract the data, using the following simple job:
0683p000009MDqm.png 
And expect something like:
|=-+-----+---=|
|ID|color|size|
|=-+-----+---=|
|1 |red  |S   |
|2 |green|M   |
|3 |blue |L   |
'--+-----+----'

Well, unfortunately this simple task is not possible!
If the XML file is big and if you switch to SAX parser you get:
(with Dom4J you get the exptected result)
|=-+-----+---=|
|ID|color|size|
|=-+-----+---=|
|1 |null |null|
|2 |null |null|
|3 |null |null|
'--+-----+----'

The job:
Here I'm attaching the job for convenience:  TestProduct.zip

The solution: 
(there could be a way with by fetching all attributes as different records and then try to apply some filter and aggregate functions but this would be bad both in convenience and performance, so I'm skipping this option)
An elegant solution which I'd implement as native processing algorithm Talend would be to:
Parse with SAX (I'm not sure even that support of other is worth) and 
when you get the "Lookup XPath query" document, then perform the user "XPath query"s against it.

This way you get the benefits of both SAX + XPath while keeping perfect performance (the difference is negligible).
That's also what I currently do but with custom code (i.e. configure the component to fetch just the whole loop document and next parse if and perform the needed xpath queries in subsequent tJavaFlex component.)

I'd be glad to hear your thoughts.
Best Regards,
Mirko
Labels (5)
13 Replies
Anonymous
Not applicable
Author

Hi Rhal, 
0683p000009MACJ.png I have 15 years - now what? 
(and I'm not Java developer) 
You obviously still haven't read my initial post ... : )
No point to comment the rest.

Best Regards,
Mirko
Anonymous
Not applicable
Author

15 years and you can't see the flaw in building XML functionality based around a single loop? 15 years and you think that getting around issues of diverging flows can arbitrarily be solved by saving to a local file system (obviously not worked in any secure locations)? 15 years and you think that a loosely coupled component based system can be improved by tightly coupling pieces of functionality that already exist, in order to serve a very small number of issues (that can be solved by using the existing components). OK chap, I think we are done here 🙂
Anonymous
Not applicable
Author

We are done long time ago rhall ...
I'm sorry you didn't get the sarcasm in the "solutions" (or the ad hominem years argument)
Anonymous
Not applicable
Author

PS
I'm sorry if you saw in the topic personal critics/challenge, rather than attempt to help. 
(that was not my idea - my apology if you've felt it like that)
Best Regards,
Mirko