
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Need help converting XML to CSV
Hi all,
First time making a post here and was wondering if anyone would be able to advise on how I can convert an XML file to a csv file. Currently, I have a simple job where I use tfileinputMSXML and connect that to a tfileoutputdelimited component.
I think where I am most stuck is the xml component. I tried making a schema based on the below XML but am not able to retrieve all the fields. Would anyone have suggestions on what I could do? If there is other bits of information you need please let me know!
XML:
<?xml version="1.0" encoding="ISO-8859-9"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>8580</WindowHeight>
<WindowWidth>15180</WindowWidth>
<WindowTopX>120</WindowTopX>
<WindowTopY>45</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s22">
<Borders>
<Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1"/>
<Border ss:Position="Left" ss:LineStyle="Continuous" ss:Weight="1"/>
<Border ss:Position="Right" ss:LineStyle="Continuous" ss:Weight="1"/>
<Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1"/>
</Borders>
<Font ss:FontName="Arial" x:Family="Swiss"/>
</Style>
<Style ss:ID="s62">
<Alignment ss:Horizontal="Center" ss:Vertical="Center"/>
<Borders>
<Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1"/>
<Border ss:Position="Left" ss:LineStyle="Continuous" ss:Weight="1"/>
<Border ss:Position="Right" ss:LineStyle="Continuous" ss:Weight="1"/>
<Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1"/>
</Borders>
<Font ss:FontName="Arial" x:Family="Swiss" ss:Bold="1"/>
<Interior ss:Color="#C0C0C0" ss:Pattern="Solid"/>
</Style>
</Styles><Worksheet ss:Name="wms_storage"><Table><Row><Cell ss:StyleID="s62"><Data ss:Type="String"> DC </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Article Number </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Artclie Description </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> PO# </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Purchasing Group </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> QTY Available </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Case Pack </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Multi SKU </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Receipt Date </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> FIFO </Data></Cell></Row>
<Row><Cell ss:StyleID="s22"><Data ss:Type="String"> 101 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 99999 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> prodOne </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 222222 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> M99 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 3333 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 64 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> N </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 08-SEP-20 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> </Data></Cell></Row>
</Table></Worksheet></Workbook>
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With Xpath, you can retrieve the nth element with brackets []
If I understand correcty, you would like to retrieve the data in columns?
I think you may be able to achieve this by creating multiple Talend columns and using correct Xpath to get correct value.
You may need to change your loop to the Row elemen : "/Workbook/Worksheet/Table/Row"
And then, use Xpath like the ones below for each column:
- DC > "Cell[1]/Data"
- Article Number > "Cell[2]/Data"
- Artclie Description > "Cell[3]/Data"
and so on...
Btw: do you really need a tFileInputMSXML? This component is useful when you need to retrieve complex XML files with multiple loops for example.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jimmy Nguyen , you need to check the xpath the field which your not getting data.
Thanks,
Manohar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Manohar B for the reply! I tried entering an xpath (screenshot below) and even went as far as to grab all attributes and put them into "fields to extract". I do not see the expected fields. I want to see if I can parse this XML file to see row 2 in excel in my original post.
Did I do something incorrect here?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have to setup the loop xpath on the XML node you want to loop over, not the parent node.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Lenneli, appreciate the help on this! I was stuck on this for days!
Went ahead and followed your advice and put the cells as my xpath to loop over. Then pulled the values (screenshot below for reference in case anyone is using this thread as a resource).
Not sure if there is a way to format this so the headers are shown then the values underneath. Was only able to find a way to get all the values in my xml file under one column in the schema screenshot below Either way I think this is a good step in the right direction when extracting data.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With Xpath, you can retrieve the nth element with brackets []
If I understand correcty, you would like to retrieve the data in columns?
I think you may be able to achieve this by creating multiple Talend columns and using correct Xpath to get correct value.
You may need to change your loop to the Row elemen : "/Workbook/Worksheet/Table/Row"
And then, use Xpath like the ones below for each column:
- DC > "Cell[1]/Data"
- Article Number > "Cell[2]/Data"
- Artclie Description > "Cell[3]/Data"
and so on...
Btw: do you really need a tFileInputMSXML? This component is useful when you need to retrieve complex XML files with multiple loops for example.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's exactly what I'm looking for! I tried your suggestion and it worked as I wanted to look! And you're right, I guess I didn't need to us "...MSXML" component (used tFileInputXML instead) and it works just as fine. I didn't even know this was possible, you're real life saver 🙂 marking this as the updated best answer for those who need an answer.
