spark configuration to use cluster created by tAmazonEMRManage
Hi Experts, I am new to Talend and using Big data platform 6.1.1. I am able to create and launch a cluster using the TAmazonEMRManage however I wanted to use this as pre task to a big data batch job , using Amazon EMR Spark where I wanted to read from S3 and postgres (RDS) and write to S3 and postgres and I am facing following challenges 1> unable to pass the resource manager to the spark configuration of 2nd job dynamically. 2> unable to use 2 tS3Configuration components in same big data batch job, to read from multiple s3 buckets 3> unable to find a postgres connector in big data batch job. Could you please advice . Thanks, ajmani
Hi, Have you tried to use tS3XXX component in a standard job and call a spark job through subjob(tRunjob)? For RDS, you can use spark component to achieve it tMysql component for RDS(Aurora/Mysql), tOracle component for RDS(Oracle) tJDBC component for RDS(MariaDB/PostgreSQL/SQLServer) Let us know if it is Ok with you case. Best regards Sabrina
Hi Sabrina, Thanks for looking into this . You have suggested to use Tjdbc a spark component to connect to postgres RDS, but TJdbc components are not available in big data batch job, Thanks, ajmani