<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Extracting the latest files from Amazon S3 folder in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368876#M132092</link>
    <description>&lt;P&gt;hello,&lt;/P&gt;&lt;P&gt;We have the below scenario in our project. We have a S3 bucket. We recieve 3rd party files  in that folder. we recieve hourly files in that folder. The number of files could also vary from 2 to 5 depending on the volume of the data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The requirement is to extract these latest .csv files every hour and process them through Talend to redshift database. Can some one suggest how can we extract ONLY the latest files from S3 bucket out of all the files kept there? would appreciate any inputs for the same.&lt;/P&gt;</description>
    <pubDate>Fri, 15 Nov 2024 23:41:29 GMT</pubDate>
    <dc:creator>sushantk19</dc:creator>
    <dc:date>2024-11-15T23:41:29Z</dc:date>
    <item>
      <title>Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368876#M132092</link>
      <description>&lt;P&gt;hello,&lt;/P&gt;&lt;P&gt;We have the below scenario in our project. We have a S3 bucket. We recieve 3rd party files  in that folder. we recieve hourly files in that folder. The number of files could also vary from 2 to 5 depending on the volume of the data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The requirement is to extract these latest .csv files every hour and process them through Talend to redshift database. Can some one suggest how can we extract ONLY the latest files from S3 bucket out of all the files kept there? would appreciate any inputs for the same.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 23:41:29 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368876#M132092</guid>
      <dc:creator>sushantk19</dc:creator>
      <dc:date>2024-11-15T23:41:29Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368877#M132093</link>
      <description>&lt;P&gt;You can use tS3List to list all the files in a bucket but I'm not sure how you'd decide which are the 'latest'. Is there any sort of time/date in the bucket name or file name?&lt;/P&gt;</description>
      <pubDate>Mon, 27 Sep 2021 15:17:54 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368877#M132093</guid>
      <dc:creator>MattE</dc:creator>
      <dc:date>2021-09-27T15:17:54Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368878#M132094</link>
      <description>&lt;P&gt;if there is no information to determine the latest files:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;keep a list of the files available/processed&lt;/P&gt;&lt;P&gt;and, use the tS3List to determine which ones arrived since last time&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;caveat - this is very inefficient&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Sep 2021 16:00:16 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368878#M132094</guid>
      <dc:creator>XJ_1630</dc:creator>
      <dc:date>2021-09-27T16:00:16Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368879#M132095</link>
      <description>&lt;P&gt;@Matt Evans​&amp;nbsp;: Thanks. Yes the file name ( i.e date and time in filename changes every hour). How can the tS3List decide which one is the latest file? what logic can we use there?&lt;/P&gt;</description>
      <pubDate>Tue, 28 Sep 2021 07:27:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368879#M132095</guid>
      <dc:creator>sushantk19</dc:creator>
      <dc:date>2021-09-28T07:27:53Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368880#M132096</link>
      <description>&lt;P&gt;@Xuan Junior​&amp;nbsp;:  Yes the file name ( i.e date and time in filename changes every hour). How can the tS3List decide which one is the latest file? what logic can we use there?&lt;/P&gt;</description>
      <pubDate>Tue, 28 Sep 2021 07:28:17 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368880#M132096</guid>
      <dc:creator>sushantk19</dc:creator>
      <dc:date>2021-09-28T07:28:17Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368881#M132097</link>
      <description>&lt;P&gt;@Xuan Junior​&amp;nbsp;: any update on same. file names are like qppxg6dy3oqo_2021-05-25T210000_8fd9627ba6f33235446f8fcb88ca7891_be822a.csv and &lt;/P&gt;&lt;P&gt;qppxg6dy3oqo_2021-05-25T220000_8fd9627ba6f33235446f8fcb88ca7891_2853a9.csv for files from 21:00 and 22:00.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Sep 2021 08:32:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368881#M132097</guid>
      <dc:creator>sushantk19</dc:creator>
      <dc:date>2021-09-29T08:32:34Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368882#M132098</link>
      <description>&lt;P&gt;@Matt Evans​&amp;nbsp;: any update on same. file names are like qppxg6dy3oqo_2021-05-25T210000_8fd9627ba6f33235446f8fcb88ca7891_be822a.csv and &lt;/P&gt;&lt;P&gt;qppxg6dy3oqo_2021-05-25T220000_8fd9627ba6f33235446f8fcb88ca7891_2853a9.csv for files from 21:00 and 22:00.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Sep 2021 08:32:46 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368882#M132098</guid>
      <dc:creator>sushantk19</dc:creator>
      <dc:date>2021-09-29T08:32:46Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368883#M132099</link>
      <description>&lt;P&gt;with that setup - you could try to retried the date from the filenames&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;did you try using the dates in the filenames?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Sep 2021 09:31:39 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368883#M132099</guid>
      <dc:creator>XJ_1630</dc:creator>
      <dc:date>2021-09-30T09:31:39Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368884#M132100</link>
      <description>&lt;P&gt;@Xuan Junior​&amp;nbsp;: didnt understand you completely. if we hardcode the dates in filenames, then the process could not be automated. but we need to automate this process. can you tell me how do i select just the latest files (hourly files) from &amp;nbsp;tS3List component?&lt;/P&gt;</description>
      <pubDate>Mon, 04 Oct 2021 06:41:55 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368884#M132100</guid>
      <dc:creator>sushantk19</dc:creator>
      <dc:date>2021-10-04T06:41:55Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368885#M132101</link>
      <description>&lt;P&gt;There's no easy or clean way which i can see to do this. tS3List can give you the name of each file in a bucket via the CURRENT_KEY after variable but that's all. You could then extract the date and time from the filename in java, perhaps using substring if you are certain the filename will always be in that format. Then build a list of the filenames and extracted times, sort them and chose the most recent. Then use tS3Get to download those files only. But as i said that method is dependant on the date and time always being in the same place in the filename. &lt;/P&gt;</description>
      <pubDate>Tue, 05 Oct 2021 15:04:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368885#M132101</guid>
      <dc:creator>MattE</dc:creator>
      <dc:date>2021-10-05T15:04:10Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting the latest files from Amazon S3 folder</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368886#M132102</link>
      <description>&lt;P&gt;yes i agree. That is why i have the asked the client team to place the latest files in a new folder rather in same folder which has all the other files. Then the process can be easily automated.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Oct 2021 12:00:20 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Extracting-the-latest-files-from-Amazon-S3-folder/m-p/2368886#M132102</guid>
      <dc:creator>sushantk19</dc:creator>
      <dc:date>2021-10-08T12:00:20Z</dc:date>
    </item>
  </channel>
</rss>

