The Lambda architecture is used to reliably update the data lake as well as efficiently train machine learning models to predict upcoming events accurately. The architecture comprises a Batch Layer, Speed Layer (also known as the Stream layer), and Serving Layer. The batch layer operates on the complete data and thus allows the system to produce the most accurate results. However, the results come at the cost of high latency due to high computation time. The speed layer generates results in a low latency, near real-time fashion. The speed layer is used to compute the real-time views to complement the batch views. The Serving layer enables various queries of the results sent from the batch and speed layers.
The architecture comprises the following components:
Data is ingested in real-time using change data capture for real-time data replication without impairing production system performance.
Batch / Incremental Load is used for historical data with fault-tolerant, distributed storage, ensuring a low possibility of errors even if the system crashes.
Analytics is used to discover, interpret, and communicate meaningful patterns in data to apply toward effective decision-making.
Monitor data ingestion tasks with a single pane of glass view.
Orchestrate data ingestion tasks based on pre-set conditions or calculations.
APIs to automate and integrate with other monitoring and orchestration applications.