<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Help with tExtractXMLField for XHTML in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Help-with-tExtractXMLField-for-XHTML/m-p/2292782#M65821</link>
    <description>&lt;P&gt;&lt;A href="https://community.qlik.com/s/profile/0053p000007LMwrAAG"&gt;@Tshak&lt;/A&gt;,did you verified below link?&lt;/P&gt;
&lt;P&gt;&lt;A href="https://help.talend.com/reader/ixBASPZJ7IvqUQVupZwWbg/EFuE5Nul595D24TRwbFnbw" target="_blank" rel="nofollow noopener noreferrer"&gt;https://help.talend.com/reader/ixBASPZJ7IvqUQVupZwWbg/EFuE5Nul595D24TRwbFnbw&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 24 Apr 2018 06:50:15 GMT</pubDate>
    <dc:creator>manodwhb</dc:creator>
    <dc:date>2018-04-24T06:50:15Z</dc:date>
    <item>
      <title>Help with tExtractXMLField for XHTML</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Help-with-tExtractXMLField-for-XHTML/m-p/2292781#M65820</link>
      <description>&lt;P&gt;I am writing a job to extract content out of word doc &amp;amp; .html files and load to elasticsearch. I am using tTikaExtractor to extract the contents out of the files.&amp;nbsp; I having the following components in my job.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;tFileList--&amp;gt;tTikaExractor--&amp;gt;tRowGenerator--&amp;gt;tExtractXML--&amp;gt;tFileOutputDelimited&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;The process seems to work upto tRowGenerator. However tExtractXML is not fetching any data out. I have the following in the tExtractXML component&lt;/P&gt; 
&lt;P&gt;loop xpath query =&amp;nbsp; &amp;nbsp;"/html/head/"&lt;/P&gt; 
&lt;P&gt;Mapping values for title/xpath query are&lt;/P&gt; 
&lt;P&gt;"title" = "/title"&lt;/P&gt; 
&lt;P&gt;"body" = "/html/body"&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Not sure how to extract creator value from &lt;SPAN&gt;&amp;lt;meta name="dc:creator" content="Tshak"/&amp;gt; in the data&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Following is the output coming out of tRowGenerator&lt;/P&gt; 
&lt;P&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&amp;lt;html xmlns="&lt;A href="http://www.w3.org/1999/xhtml" target="_blank" rel="nofollow noopener noreferrer"&gt;http://www.w3.org/1999/xhtml&lt;/A&gt;"&amp;gt;&lt;BR /&gt;&amp;lt;head&amp;gt;&lt;BR /&gt;&amp;lt;meta name="date" content="2018-04-20T14:18:00Z"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="cp:revision" content="4"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="Total-Time" content="1"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="extended-properties:AppVersion" content="16.0000"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="meta&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MAB6.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/158321i00588DF41617C922/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MAB6.png" alt="0683p000009MAB6.png" /&gt;&lt;/span&gt;aragraph-count" content="1"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="meta:word-count" content="11"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="dc:creator" content="Tshak"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="extended-properties:Company" content="Tshak"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="Word-Count" content="11"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="publisher" content="Tshak"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="meta&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MAB6.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/158321i00588DF41617C922/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MAB6.png" alt="0683p000009MAB6.png" /&gt;&lt;/span&gt;age-count" content="1"/&amp;gt;&lt;BR /&gt;&amp;lt;meta name="dc&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MAB6.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/158321i00588DF41617C922/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MAB6.png" alt="0683p000009MAB6.png" /&gt;&lt;/span&gt;ublisher" content="Tshak"/&amp;gt;&lt;BR /&gt;&amp;lt;title&amp;gt;Test Extraction&amp;lt;/title&amp;gt;&lt;BR /&gt;&amp;lt;/head&amp;gt;&lt;BR /&gt;&amp;lt;body&amp;gt;&amp;lt;p&amp;gt;&amp;lt;b&amp;gt;&amp;lt;u&amp;gt;Help Desk&amp;lt;/b&amp;gt;&amp;lt;/u&amp;gt;&amp;lt;/p&amp;gt;&lt;BR /&gt;&amp;lt;p&amp;gt;&amp;lt;a name="_GoBack"/&amp;gt;First paragraph content&amp;lt;/p&amp;gt;&lt;BR /&gt;&amp;lt;p/&amp;gt;&lt;BR /&gt;&amp;lt;p&amp;gt;&amp;lt;b&amp;gt;&amp;lt;u&amp;gt;Helpdesk Portal&amp;lt;/b&amp;gt;&amp;lt;/u&amp;gt;&amp;lt;/p&amp;gt;&lt;BR /&gt;&amp;lt;p&amp;gt;Second paragraph content&amp;lt;/p&amp;gt;&lt;BR /&gt;&amp;lt;p/&amp;gt;&lt;BR /&gt;&amp;lt;p/&amp;gt;&lt;BR /&gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Appreciate your help!&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 08:21:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Help-with-tExtractXMLField-for-XHTML/m-p/2292781#M65820</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T08:21:43Z</dc:date>
    </item>
    <item>
      <title>Re: Help with tExtractXMLField for XHTML</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Help-with-tExtractXMLField-for-XHTML/m-p/2292782#M65821</link>
      <description>&lt;P&gt;&lt;A href="https://community.qlik.com/s/profile/0053p000007LMwrAAG"&gt;@Tshak&lt;/A&gt;,did you verified below link?&lt;/P&gt;
&lt;P&gt;&lt;A href="https://help.talend.com/reader/ixBASPZJ7IvqUQVupZwWbg/EFuE5Nul595D24TRwbFnbw" target="_blank" rel="nofollow noopener noreferrer"&gt;https://help.talend.com/reader/ixBASPZJ7IvqUQVupZwWbg/EFuE5Nul595D24TRwbFnbw&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Apr 2018 06:50:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Help-with-tExtractXMLField-for-XHTML/m-p/2292782#M65821</guid>
      <dc:creator>manodwhb</dc:creator>
      <dc:date>2018-04-24T06:50:15Z</dc:date>
    </item>
    <item>
      <title>Re: Help with tExtractXMLField for XHTML</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Help-with-tExtractXMLField-for-XHTML/m-p/2292783#M65822</link>
      <description>&lt;P&gt;Thanks for your response Manohar. Your suggestion is&amp;nbsp;working! I am able to extract the title and body content from the xml (xhtml).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Apr 2018 05:28:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Help-with-tExtractXMLField-for-XHTML/m-p/2292783#M65822</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-04-25T05:28:40Z</dc:date>
    </item>
  </channel>
</rss>

