<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: parsing XML/HTML in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210914#M9153</link>
    <description>&lt;P&gt;first I thank you for your answer, no I can extract the site as CSV, Xls, it is possible that you look at the site&lt;BR /&gt;but maybe I do not know how&lt;BR /&gt;any way, I created a job as following but I have a problem in writing the codes&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="parsehttp.PNG" style="width: 579px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M7D4.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/153911iE35ED88AEE50D72B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M7D4.png" alt="0683p000009M7D4.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;I searched between the questions in community and I find it &lt;A href="https://community.qlik.com/s/feed/0D73p000004k5l5CAA#M95040" target="_blank" rel="noopener"&gt;https://community.talend.com/t5/Design-and-Development/Extract-Multiple-table-using-tHTTPTableInput-...&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;but I do not know how I can use this way for my project because the site has several div and pdf and link and the data is not exactly in the specific tables&lt;/P&gt; 
&lt;P&gt;thanks&lt;/P&gt; 
&lt;P&gt;regards&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 27 Aug 2019 11:18:08 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2019-08-27T11:18:08Z</dc:date>
    <item>
      <title>parsing XML/HTML</title>
      <link>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210912#M9151</link>
      <description>&lt;P&gt;Hello everyone&lt;BR /&gt;first of all thank you for your time to help me&lt;BR /&gt;in fact I want to parsing xml / html from site&amp;nbsp;&lt;A href="https://www.cert.ssi.gouv.fr/" target="_blank" rel="noopener nofollow noopener noreferrer"&gt;https://www.cert.ssi.gouv.fr/&lt;/A&gt;&lt;BR /&gt;I'm expecting to have a table like that&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="oldvul.PNG" style="width: 957px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M791.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/142489i4E579AA69B83FC53/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M791.png" alt="0683p000009M791.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;ie I want to parsing html and extract all the CERTEFs with a title and a publication date and all the VECs that it exists in each CERTEF&lt;BR /&gt;I do not know which component I can use and with which configuration that extract exlace the same table&lt;/P&gt; 
&lt;P&gt;thank you for helping me&lt;/P&gt;</description>
      <pubDate>Fri, 23 Aug 2019 13:38:46 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210912#M9151</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-08-23T13:38:46Z</dc:date>
    </item>
    <item>
      <title>Re: parsing XML/HTML</title>
      <link>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210913#M9152</link>
      <description>hi, 
&lt;BR /&gt;there is no component for that but you can open html pages as xml and parse tem using xml components. 
&lt;BR /&gt;!!be advice that today a lot of site are filling using javascript so you cannot directly access data!!! 
&lt;BR /&gt;is there a way to export data as xls or csv? if yes, it's the best way. 
&lt;BR /&gt;an other possibility is to use RPA (Robotic Process Autoation) to extract data from web. 
&lt;BR /&gt;good luck 
&lt;BR /&gt;</description>
      <pubDate>Tue, 27 Aug 2019 09:29:08 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210913#M9152</guid>
      <dc:creator>fdenis</dc:creator>
      <dc:date>2019-08-27T09:29:08Z</dc:date>
    </item>
    <item>
      <title>Re: parsing XML/HTML</title>
      <link>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210914#M9153</link>
      <description>&lt;P&gt;first I thank you for your answer, no I can extract the site as CSV, Xls, it is possible that you look at the site&lt;BR /&gt;but maybe I do not know how&lt;BR /&gt;any way, I created a job as following but I have a problem in writing the codes&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="parsehttp.PNG" style="width: 579px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M7D4.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/153911iE35ED88AEE50D72B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M7D4.png" alt="0683p000009M7D4.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;I searched between the questions in community and I find it &lt;A href="https://community.qlik.com/s/feed/0D73p000004k5l5CAA#M95040" target="_blank" rel="noopener"&gt;https://community.talend.com/t5/Design-and-Development/Extract-Multiple-table-using-tHTTPTableInput-...&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;but I do not know how I can use this way for my project because the site has several div and pdf and link and the data is not exactly in the specific tables&lt;/P&gt; 
&lt;P&gt;thanks&lt;/P&gt; 
&lt;P&gt;regards&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Aug 2019 11:18:08 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210914#M9153</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-08-27T11:18:08Z</dc:date>
    </item>
    <item>
      <title>Re: parsing XML/HTML</title>
      <link>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210915#M9154</link>
      <description>thttprequest alow you to get http response like rest htlm or soap.&lt;BR /&gt;tJajaFlex is a free java code component. I think data are extracted in this component.&lt;BR /&gt;Regards,&lt;BR /&gt;good luck</description>
      <pubDate>Tue, 27 Aug 2019 12:39:22 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/parsing-XML-HTML/m-p/2210915#M9154</guid>
      <dc:creator>fdenis</dc:creator>
      <dc:date>2019-08-27T12:39:22Z</dc:date>
    </item>
  </channel>
</rss>

