<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: [resolved] Using tFileUnarchive on nested folder structure in S3 in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/resolved-Using-tFileUnarchive-on-nested-folder-structure-in-S3/m-p/2325233#M94855</link>
    <description>Hi I have done that and now its reading the files within the directories.&lt;BR /&gt;But how do I do the same through the EMR.&lt;BR /&gt;How can I define the path there. I use a path /home/work/talend/&lt;BR /&gt;Any help on this.</description>
    <pubDate>Thu, 04 Jun 2015 07:24:51 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2015-06-04T07:24:51Z</dc:date>
    <item>
      <title>[resolved] Using tFileUnarchive on nested folder structure in S3</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Using-tFileUnarchive-on-nested-folder-structure-in-S3/m-p/2325231#M94853</link>
      <description>Here's a simple job which I have built. 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;Here's a simple job which I have built.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;Example job layout showing how to conditionally unzip files from S3&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;As normal, I connect to S3 and then I list all the relevant objects in the bucket using the tS3List and then pass this to tS3Get.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;In the above job I set tS3Get up to fetch every object that is iterated on by the tS3List component by setting the key as:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;((String)globalMap.get("tS3List_1_CURRENT_KEY"))&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;and then downloading it to:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;"C:/Talend/5.6.1/studio/workspace/S3_downloads/" + ((String)globalMap.get("tS3List_1_CURRENT_KEY"))&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;The extra bit I've added starts with a&amp;nbsp;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;Run If&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;&amp;nbsp;conditional link from the tS3Get which links the tFileUnarchive with the condition:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;((String)globalMap.get("tS3List_1_CURRENT_KEY")).endsWith(".zip")&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;Which checks to see if the file being downloaded from S3 is a&amp;nbsp;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;.zip&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;&amp;nbsp;file.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;The tFileUnarchive component then needs to be told what to unzip, which will be the file I just downloaded:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;"C:/Talend/5.6.1/studio/workspace/S3_downloads/" + ((String)globalMap.get("tS3List_1_CURRENT_KEY"))&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;and where to extract it to:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;"C:/Talend/5.6.1/studio/workspace/S3_downloads"&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;This then puts any extracted files in the same place as the ones that didn't need extracting.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;From here I can now iterate through the downloads folder looking for the file types I want by setting the directory to&amp;nbsp;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;"C:/Talend/5.6.1/studio/workspace/S3_downloads"&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;&amp;nbsp;and the global expression to&amp;nbsp;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;"*.txt"&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;&amp;nbsp;in this case as I wanted to read in only the txt files (including the zipped ones) I had in S3.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;Finally, i then read the delimited files by setting the file to be read by the tFileInputDelimited component as:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;And in my case I simply then printed this to the console(in my original job I have tmap and the output table).&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;Now the issue here is :&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;The first is that I have a bucket on S3 suppose 'Analysis' and inside that I have a month wise folder like 'May2015', 'June2015' and so on. So the tfileUnarchive is extracting with the folder i.e in the path I have specified i.e C:/Talend/5.6.1/studio/workspace/S3_downloads/May2015/File.txt and while trying to iterate it using tfilelist I am not able to find the file as my tfilelist is searching in the the s3_downloads only i.e C:/Talend/5.6.1/studio/workspace/S3_download. So how can I go inside the folder in my tfilelist.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;&amp;nbsp;Also here I have shown how to do it from local system but if I want to run it through EMR cluster then how do I achieve it.Thats means how do I change the path sturcture or something by which I will be able to run it through EMR.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#222222"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Helvetica Neue, Helvetica, Arial, sans-serif"&gt;Any help on this is greatly appreciated.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MDcK.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/131043i64DE45DC97BF617C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MDcK.png" alt="0683p000009MDcK.png" /&gt;&lt;/span&gt;</description>
      <pubDate>Wed, 27 May 2015 19:33:33 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Using-tFileUnarchive-on-nested-folder-structure-in-S3/m-p/2325231#M94853</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2015-05-27T19:33:33Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Using tFileUnarchive on nested folder structure in S3</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Using-tFileUnarchive-on-nested-folder-structure-in-S3/m-p/2325232#M94854</link>
      <description>&lt;PRE&gt;So the tfileUnarchive is extracting with the folder i.e in the path I have specified i.e C:/Talend/5.6.1/studio/workspace/S3_downloads/May2015/File.txt and while trying to iterate it using tfilelist I am not able to find the file as my tfilelist is searching in the the s3_downloads only i.e C:/Talend/5.6.1/studio/workspace/S3_download. So how can I go inside the folder in my tfilelist.&lt;/PRE&gt;
&lt;BR /&gt;Check the 'Includes subdirectories' box to includes the sub directories.&amp;nbsp;
&lt;BR /&gt;Best regards
&lt;BR /&gt;Shong</description>
      <pubDate>Sun, 31 May 2015 08:55:55 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Using-tFileUnarchive-on-nested-folder-structure-in-S3/m-p/2325232#M94854</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2015-05-31T08:55:55Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Using tFileUnarchive on nested folder structure in S3</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Using-tFileUnarchive-on-nested-folder-structure-in-S3/m-p/2325233#M94855</link>
      <description>Hi I have done that and now its reading the files within the directories.&lt;BR /&gt;But how do I do the same through the EMR.&lt;BR /&gt;How can I define the path there. I use a path /home/work/talend/&lt;BR /&gt;Any help on this.</description>
      <pubDate>Thu, 04 Jun 2015 07:24:51 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Using-tFileUnarchive-on-nested-folder-structure-in-S3/m-p/2325233#M94855</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2015-06-04T07:24:51Z</dc:date>
    </item>
  </channel>
</rss>

