This course helps data engineers focus on essential design and architecture while building a data lake and relevant processing platform. Participants will learn various aspects of data engineering while building resilient distributed datasets (RDDs). Participants will learn to apply key data engineering practices, identify multiple data sources appraised against their business value, design the right storage, and implement proper access model(s).
Within the realm of Big Data Engineering, it's critical to understand how to design and maintain a data lake that is both scalable and manageable. Participants will learn how to architect a data lake that adheres to the best data engineering practices, ensuring that it can handle the influx of data from varied sources while maintaining high performance for big data processing.
The course will focus on fault-tolerant computing within the context of the Hadoop ecosystem. Enabling systems to continue operating properly in the event of the failure of some of their components is a cornerstone of big data engineering. This part of the course will explore how the Hadoop ecosystem provides fault-tolerant mechanisms through its various components.
Understanding Resilient Distributed Datasets (RDDs) and their role in big data processing will be crucial. This course will cover how RDDs within the Apache Spark framework provide a fault-tolerant way to work with large datasets across multiple nodes in a cluster. Participants will gain hands-on experience with spark rdd and rdd in spark, implementing complex data transformations and actions in a distributed environment.
City : Kuala Lumpur (Malaysia)
Code : 3289_138031
Course Date: 10 - 14 Mar 2025
The Fess : 4200 Euro