Deployment options for big data streaming job

Anonymous · ‎2018-05-22

Hi,

I have managed to build my first big data streaming job that consumes a kineses stream. I have installed a jobserver on an aws emr cluster and I am able to successfully deploy and run the job on that job server.

My only concern is that we would need an emr cluster running 24/7 just for this one job. Is there any other ways of deploying / "productionizing" a big data streaming job without running a whole cluster just for that?

Anonymous · ‎2018-05-29

Hello,

This is the whole purpose of a streaming processing running on top of a big data cluster.

If you do not need such computation power, could you please check 4 options :

- Set the spark configuration to run locally. It will only require an EC2 instance where the jobserver is deployed.
- Use Talend ESB / Camel
- Leverage the latest 7.0 feature with Cloudera Altus distribution (acting as Hadoop as a service)
- Leverage the new serverless distribution we shared on Talend Marketplace based on Qubole Saas offering (Hadoop as a service too).

Let us know if it is what you are looking for.

Best regards

Sabrina

Talend Big Data

v7.x