Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All,
Is there a way to load data to redshift in Talend spark job without using S3 ?
In Spark Job(BigDataBatch job), by default tRedshiftConfiguration is looking for tS3configuration.
Thanks
Vijay
Hi,
The Talend Bigdata job component for Redshift mandates to use S3 components as part of data load. This is the case if you try to use the Bulk component for Redshift in Standard job also.
The only component which helps you to directly load is tRedshiftOutput component in Standard job but its not advised to use it for huge data volumes.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Thank you Nikhil for your response.
S3Configuration in BigDataBatch Job is using s3a file system to enable inherit credentials from AWS role.
Is there any option to use S3N file system with inherit credentials option i.e; that is without providing access Key and secret Key ?
Problem is :
when we try to use s3a file system, getting some access issues while running spark job in EMR using s3a file system option checked in talend s3configuration job.
Error:
java.nio.file.AccessDeniedException: s3a://<<Location>>/_temporary/0: innerMkdirs on s3a://<<location>>/_temporary/0: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;
and in my organization they are not willing to share the access key and secret key for all the users, instead we have to use roles to execute jobs.
I see there is no assume role option like we have in bulk load in standard jobs.
if s3a file system is the only option without access key and secret key, what are those temp folders and what permissions that instance role or redshift role should have to execute the job ?
Please provide some information on s3 usage when they load data to redshift in talend bigdata job... Thanks again.
Hi,
Unfortunately my view is that its not possible in Bigdata spark job at the moment. But I would recommend you to either raise a support case or create a JIRA ticket.
So I would recommend you to go for a Hybrid Talend Standard Job + Bigdata job approach as a work around.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved