Hello, I am using Amazon's cloud based services:EC2(elastic compute cloud), Redshift and S3(simple storage service) for hosting a data warehouse. In order to carry out this process, I am using Talend Load Jobs which would be used to transfer the files from FTP to Redshift via a three way process: i) Fetching of data from FTP and downloading it to EC2 ii) Uploading the data to S3 from EC2 iii) Loading data from S3 to Redshift tables While executing the Talend Load Job, I found out that step i) is executing normally and correspondingly the files are successfully downloaded to EC2 but step ii) is not executed successfully. The error log displays an error stating that "a region must be specified --region or specifying the region in a configuration file or as an environment variable." The detailed error log is mentioned below: A region must be specified --region or specifying the region in a configuration file or as an environment variable. Alternately, an endpoint can be specified with --endpoint-url Exception in component tRedshiftRow_1 org.postgresql.util.PSQLException: ERROR: The specified S3 prefix 'Acxiom/adv_oli_20140910.dat' does not exist Detail: ----------------------------------------------- error: The specified S3 prefix 'Acxiom/adv_oli_20140910.dat' does not exist code: 8001 context: query: 226570 location: s3_utility.cpp:545 process: padbmaster ----------------------------------------------- at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:372) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:364) at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.tRedshiftRow_1Process(Dup_adv_oli_Load_with_error_catching.java:2313) disconnected at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.tSystem_1Process(Dup_adv_oli_Load_with_error_catching.java:2199) at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.tSystem_5Process(Dup_adv_oli_Load_with_error_catching.java:2008) at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.runJobInTOS(Dup_adv_oli_Load_with_error_catching.java:3442) at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.main(Dup_adv_oli_Load_with_error_catching.java:3290) adv_oli_20140910.dat is the name of the file to be transferred from EC2 to S3, and talend_jobs.dup_adv_oli_load_with_error_catching_0_1 is the name of the Talend 'Load' Job. Any help would be highly appreciated. Thanks, Shobhit
I think here is an issue. S3 is a file storage... tRedShiftRow fires query on database... How you are reading a file using a query? can you show the query which reads file from S3 and inserts data into the redshift?... please look at following scenario...
https://help.talend.com/pages/viewpage.action?pageId=19426484#tredshiftinputscenario I think you need to download the file to local drive and then insert into the redshift...
Please let me know if I have understood differently.
Thanks
Vaibhav
The files in raw format are being picked up from the FTP and put on S3 using tsystem_5 component in the screenshot shown. This component downloads the file from FTP to EC2.
The second component copies the data from EC2 to S3 using cp Utility component.
The third component copies the data from S3 to Redshift using the following query for a fixed width raw file:
"
create table phase2.adv_oli_increment_raw_temp as select top 1 * from phase2.adv_oli_increment_raw_2;
copy phase2.adv_oli_increment_raw_temp from 's3://WoltersKluwerHealthMR/Acxiom/adv_oli_"+TalendDate.getDate("CCYYMMDD")+".dat' CREDENTIALS ''
COMPUPDATE ON ACCEPTINVCHARS MAXERROR AS 50 REMOVEQUOTES TRUNCATECOLUMNS;
"
Ok great.. As per error, following path prefix does not exists... The specified S3 prefix 'Acxiom/adv_oli_20140910.dat' does not exist Whether the bucket and the prefix exists in the S3? can you check manually? If it exists, then use print statement to check the query... And test the query using DB client