Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
jwing
Contributor
Contributor

Need help converting XML to CSV

Hi all,

First time making a post here and was wondering if anyone would be able to advise on how I can convert an XML file to a csv file. Currently, I have a simple job where I use tfileinputMSXML and connect that to a tfileoutputdelimited component.

I think where I am most stuck is the xml component. I tried making a schema based on the below XML but am not able to retrieve all the fields. Would anyone have suggestions on what I could do? If there is other bits of information you need please let me know!

0693p000009pKcUAAU.png

XML:

<?xml version="1.0" encoding="ISO-8859-9"?>

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"

xmlns:o="urn:schemas-microsoft-com:office:office"

xmlns:x="urn:schemas-microsoft-com:office:excel"

xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"

xmlns:html="http://www.w3.org/TR/REC-html40">

<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">

<WindowHeight>8580</WindowHeight>

<WindowWidth>15180</WindowWidth>

<WindowTopX>120</WindowTopX>

<WindowTopY>45</WindowTopY>

<ProtectStructure>False</ProtectStructure>

<ProtectWindows>False</ProtectWindows>

</ExcelWorkbook>

<Styles>

<Style ss:ID="Default" ss:Name="Normal">

<Alignment ss:Vertical="Bottom"/>

<Borders/>

<Font/>

<Interior/>

<NumberFormat/>

<Protection/>

</Style>

<Style ss:ID="s22">

<Borders>

<Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1"/>

<Border ss:Position="Left" ss:LineStyle="Continuous" ss:Weight="1"/>

<Border ss:Position="Right" ss:LineStyle="Continuous" ss:Weight="1"/>

<Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1"/>

</Borders>

<Font ss:FontName="Arial" x:Family="Swiss"/>

</Style>

<Style ss:ID="s62">

<Alignment ss:Horizontal="Center" ss:Vertical="Center"/>

<Borders>

<Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1"/>

<Border ss:Position="Left" ss:LineStyle="Continuous" ss:Weight="1"/>

<Border ss:Position="Right" ss:LineStyle="Continuous" ss:Weight="1"/>

<Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1"/>

</Borders>

<Font ss:FontName="Arial" x:Family="Swiss" ss:Bold="1"/>

<Interior ss:Color="#C0C0C0" ss:Pattern="Solid"/>

</Style>

</Styles><Worksheet ss:Name="wms_storage"><Table><Row><Cell ss:StyleID="s62"><Data ss:Type="String"> DC </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Article Number </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Artclie Description </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> PO# </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Purchasing Group </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> QTY Available </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Case Pack </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Multi SKU </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> Receipt Date </Data></Cell><Cell ss:StyleID="s62"><Data ss:Type="String"> FIFO </Data></Cell></Row>

<Row><Cell ss:StyleID="s22"><Data ss:Type="String"> 101 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 99999 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> prodOne </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 222222 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> M99 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 3333 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 64 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> N </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> 08-SEP-20 </Data></Cell><Cell ss:StyleID="s22"><Data ss:Type="String"> </Data></Cell></Row>

</Table></Worksheet></Workbook>

0693p000009pKaiAAE.png

Labels (2)
1 Solution

Accepted Solutions
lennelei
Creator III
Creator III

With Xpath, you can retrieve the nth element with brackets []

If I understand correcty, you would like to retrieve the data in columns?

 

I think you may be able to achieve this by creating multiple Talend columns and using correct Xpath to get correct value.

 

You may need to change your loop to the Row elemen : "/Workbook/Worksheet/Table/Row"

And then, use Xpath like the ones below for each column:

  • DC > "Cell[1]/Data"
  • Article Number > "Cell[2]/Data"
  • Artclie Description > "Cell[3]/Data"

 

and so on...

 

Btw: do you really need a tFileInputMSXML? This component is useful when you need to retrieve complex XML files with multiple loops for example.

View solution in original post

7 Replies
manodwhb
Creator III
Creator III

@Jimmy Nguyen​ , you need to check the xpath the field which your not getting data.

 

Thanks,

Manohar

jwing
Contributor
Contributor
Author

Thanks @Manohar B​  for the reply! I tried entering an xpath (screenshot below) and even went as far as to grab all attributes and put them into "fields to extract". I do not see the expected fields. I want to see if I can parse this XML file to see row 2 in excel in my original post.

 

Did I do something incorrect here?

lennelei
Creator III
Creator III

You have to setup the loop xpath on the XML node you want to loop over, not the parent node.

jwing
Contributor
Contributor
Author

Thanks Lenneli, appreciate the help on this! I was stuck on this for days!

 

Went ahead and followed your advice and put the cells as my xpath to loop over. Then pulled the values (screenshot below for reference in case anyone is using this thread as a resource).

 

Not sure if there is a way to format this so the headers are shown then the values underneath. Was only able to find a way to get all the values in my xml file under one column in the schema screenshot below Either way I think this is a good step in the right direction when extracting data.

jwing
Contributor
Contributor
Author

0693p000009potBAAQ.png

lennelei
Creator III
Creator III

With Xpath, you can retrieve the nth element with brackets []

If I understand correcty, you would like to retrieve the data in columns?

 

I think you may be able to achieve this by creating multiple Talend columns and using correct Xpath to get correct value.

 

You may need to change your loop to the Row elemen : "/Workbook/Worksheet/Table/Row"

And then, use Xpath like the ones below for each column:

  • DC > "Cell[1]/Data"
  • Article Number > "Cell[2]/Data"
  • Artclie Description > "Cell[3]/Data"

 

and so on...

 

Btw: do you really need a tFileInputMSXML? This component is useful when you need to retrieve complex XML files with multiple loops for example.

jwing
Contributor
Contributor
Author

That's exactly what I'm looking for! I tried your suggestion and it worked as I wanted to look! And you're right, I guess I didn't need to us "...MSXML" component (used tFileInputXML instead) and it works just as fine. I didn't even know this was possible, you're real life saver 🙂 marking this as the updated best answer for those who need an answer.