<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to Extract CSV file from public URL/Website using Talend DI ??? in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292987#M66009</link>
    <description>&lt;P&gt;Hi @Manohar B​, &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I checked the csv file which you attached, I am not seeing required data in it. Its just the page source data available in csv. &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TMX8AAO.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/130551i7E0EA3DBDF78126F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TMX8AAO.png" alt="0693p000009TMX8AAO.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Sasidharan&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 16 Sep 2020 15:06:54 GMT</pubDate>
    <dc:creator>Sasidharan_Udayakumar</dc:creator>
    <dc:date>2020-09-16T15:06:54Z</dc:date>
    <item>
      <title>How to Extract CSV file from public URL/Website using Talend DI ???</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292981#M66003</link>
      <description>&lt;P&gt;Hi All, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We have got a requirement and first time in my talend career to extract CSV file from a government public portal/website &lt;B&gt;&lt;I&gt;&lt;U&gt;https://vaers.hhs.gov/data/datasets.html?&lt;/U&gt;&lt;/I&gt;&lt;/B&gt;  (&lt;B&gt;&lt;I&gt;﻿This website is accessed by public and doesn't require authentication&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;. We can directly access the website and download year wise csv &lt;/P&gt;using Talend DI). &lt;P&gt;The requirement is to download below 3 different CSV's available in their website &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;B&gt;CSV File (VAERS Data)&lt;/B&gt;&lt;/LI&gt;&lt;LI&gt;&lt;B&gt;CSV File (VAERS Symptoms)&lt;/B&gt;&lt;/LI&gt;&lt;LI&gt;&lt;B&gt;CSV File (VAERS Vaccine)&lt;/B&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TKf0AAG.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/153253i47E1EC00D7FC76A8/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TKf0AAG.png" alt="0693p000009TKf0AAG.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TKZCAA4.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/156841iF22DBA7579FB180B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TKZCAA4.png" alt="0693p000009TKZCAA4.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When we try to manually download the data file, it asks for captcha for authentication and upon providing the captcha, file gets downloaded into the local machine.  Having said that, when we try to automate this file fetching of yearly files using Talend, how to handle this captcha part ? If the solution is using REST components, can we POST "Year" parameter as HTTP BODY to the website and download year wise csv files? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TKZWAA4.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/137541i2EC11385769737F3/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TKZWAA4.png" alt="0693p000009TKZWAA4.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TKZbAAO.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/139420i3EAE01F8D0999C32/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TKZbAAO.png" alt="0693p000009TKZbAAO.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried to explore tfilefetch, thttpparse but doesn't help much to read the csv file and parse the data using Talend DI.  When using tfilefetch, it extracts only HTML page source of the website into the output csv or excel that is generated. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TKbSAAW.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/138281i7EA776AF163BDF56/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TKbSAAW.png" alt="0693p000009TKbSAAW.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The reason I appended &lt;B&gt;/eSubDownload/index.jsp?fn=2020VAERSData.csv&lt;/B&gt; after the &lt;B&gt;url https://vaers.hhs.gov/data/datasets.html? &lt;/B&gt;in &lt;B&gt;URI &lt;/B&gt;because when i explored the below seen page source of the website, found this link to download 2020VAERSData csv. &lt;/P&gt;&lt;P&gt;I may be wrong but trying to explore the possibilities. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TKe2AAG.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/157960i0A8D50EBA574CFE9/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TKe2AAG.png" alt="0693p000009TKe2AAG.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;CSV Output when using tfilefetch:&lt;/U&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TKjHAAW.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/144013i0122A23D351FAFF4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TKjHAAW.png" alt="0693p000009TKjHAAW.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Can someone assist how to do this activity using Talend DI and move the data across to Azure Data Lake. &lt;/P&gt;&lt;P&gt;Please find attached screenshot of the CSV which I am referring from the government public portal. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please assist&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Sasidharan&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 13:00:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292981#M66003</guid>
      <dc:creator>Sasidharan_Udayakumar</dc:creator>
      <dc:date>2020-09-16T13:00:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract CSV file from public URL/Website using Talend DI ???</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292982#M66004</link>
      <description>&lt;P&gt;@Sasidharan Udayakumar​&amp;nbsp;, You need to download the below way to local from the url.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;check the below screenshots.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;

&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;

&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Manohar&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 14:29:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292982#M66004</guid>
      <dc:creator>manodwhb</dc:creator>
      <dc:date>2020-09-16T14:29:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract CSV file from public URL/Website using Talend DI ???</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292983#M66005</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TLqeAAG.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/132651i0E12156D33080753/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TLqeAAG.png" alt="0693p000009TLqeAAG.png" /&gt;&lt;/span&gt;@Sasidharan Udayakumar​&amp;nbsp;, &lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TM09AAG.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/129692i5DB26A1153F56BBE/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TM09AAG.png" alt="0693p000009TM09AAG.png" /&gt;&lt;/span&gt;
&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TM1lAAG.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/128976i8950D8087A92D325/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TM1lAAG.png" alt="0693p000009TM1lAAG.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 14:35:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292983#M66005</guid>
      <dc:creator>manodwhb</dc:creator>
      <dc:date>2020-09-16T14:35:32Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract CSV file from public URL/Website using Talend DI ???</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292984#M66006</link>
      <description>&lt;P&gt;Hi @Manohar B​&amp;nbsp;, &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your help and support in this issue. &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you also pls share the screenshot of the output data file generated in the destination directory using tfilefetch ? And why did you select POST Method in tfilefetch properties ? Since we are trying to GET the file, why POST here ? apologies if this is a layman's question. &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Sasidharan&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 14:45:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292984#M66006</guid>
      <dc:creator>Sasidharan_Udayakumar</dc:creator>
      <dc:date>2020-09-16T14:45:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract CSV file from public URL/Website using Talend DI ???</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292985#M66007</link>
      <description>&lt;P&gt;@Sasidharan Udayakumar​&amp;nbsp;, please find the attached downloaded file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 14:50:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292985#M66007</guid>
      <dc:creator>manodwhb</dc:creator>
      <dc:date>2020-09-16T14:50:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract CSV file from public URL/Website using Talend DI ???</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292986#M66008</link>
      <description>&lt;P&gt;@Sasidharan Udayakumar​&amp;nbsp;, Regarding post check is not necessity. you can uncheck it.the job able to download file.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 14:54:05 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292986#M66008</guid>
      <dc:creator>manodwhb</dc:creator>
      <dc:date>2020-09-16T14:54:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract CSV file from public URL/Website using Talend DI ???</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292987#M66009</link>
      <description>&lt;P&gt;Hi @Manohar B​, &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I checked the csv file which you attached, I am not seeing required data in it. Its just the page source data available in csv. &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009TMX8AAO.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/130551i7E0EA3DBDF78126F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009TMX8AAO.png" alt="0693p000009TMX8AAO.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Sasidharan&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 15:06:54 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292987#M66009</guid>
      <dc:creator>Sasidharan_Udayakumar</dc:creator>
      <dc:date>2020-09-16T15:06:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract CSV file from public URL/Website using Talend DI ???</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292988#M66010</link>
      <description>&lt;P&gt;Hi @Sasidharan Udayakumar​&amp;nbsp;, &lt;A href="https://community.talend.com/s/profile/0053p000007LKmJAAW" alt="https://community.talend.com/s/profile/0053p000007LKmJAAW" target="_blank"&gt;manodwhb&lt;/A&gt;, &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you observe the html code in generated csv files, it is printing the page where you enter the 'captcha' for verification, this captcha is in image format as shown below:&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000009oSktAAE.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154992i31EFCF493C0EB433/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000009oSktAAE.png" alt="0693p000009oSktAAE.png" /&gt;&lt;/span&gt;Somehow if we can capture this captcha in readable format then we can utilize this for further processing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Sep 2020 07:51:18 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-Extract-CSV-file-from-public-URL-Website-using-Talend-DI/m-p/2292988#M66010</guid>
      <dc:creator>vikramk</dc:creator>
      <dc:date>2020-09-17T07:51:18Z</dc:date>
    </item>
  </channel>
</rss>

