Big Data Processing Course

About the course

Guillaume Eynard-Bontemps, Emmanuelle Sarrazin for ISAE-Supaero

Course content

Day 1: Big Data, Distributed Computing and Spark

Course Introduction

Introduction to Big Data and its Ecosystem

Big Data Platforms, Hadoop and beyond

Spark Introduction and exercise

Play with MapReduce through Spark

Day 2: Cloud Computing and Kubernetes

Introduction to Cloud Computing

Includes first interaction with Google Cloud.

Containers and Docker

Includes Docker exercises.

Containers Orchestration, Kubernetes

Includes Kubernetes exercices.

Object Storage and Cloud Optimized datasets

Day 3 (morning): Deploy your own processing platform on Kubernetes

Deploy Data processing platform on the Cloud

Day 3 (afternoon): Python ecosystem for data processing

The rise of the Python ecosystem for Data Processing

Includes Pandas library tutorial, Xarray library tutorial

Distributed processing

Includes Parallel tutorials

Day 4: Python for distributed processing

Manage large datasets

Includes Large dataset tutorials

Dask processing framework

Includes Dask tutorial.

If necessary, finish data processing platform deployment

Day 5: Evaluation

Final Evaluation

SDD DE Data Distribution course

SDD DE Course Introduction

CNES Big Data Processing & Distribution course

CNES Course Introduction