Guillaume Eynard-Bontemps, Emmanuelle Sarrazin,
Hugues Larat, CNES (Centre National d’Etudes Spatiales - French Space
Agency)
2024-02-24
Welcome
Course Overview
Big Data Processing
Harnessing the complexity of large amounts of data is a challenge in
itself.
But Big Data processing is more than that: originally characterized
by the 3 Vs of Volume, Velocity and Variety, the concepts popularized by
Hadoop and Google requires dedicated computing solutions (both software
and infrastructure), which will be explored in this module.
Objectives
By the end of this module, participants will be able to:
Understand the differences and usage between main distributed
computing architectures (HPC, Big Data, Cloud, CPU vs GPGPU)
Implement the distribution of simple operations via the Map/Reduce
principle in PySpark
Connect on a cloud computing engine (e.g. Google Cloud Platform) and
use it
Understand the principle of containers (through Docker) and
Kubernetes
Deploy a Big Data Processing Platform on the Cloud
Implement the distribution of data wrangling/cleaning and training
machine learning algorithms using PyData stack, Jupyter notebooks and
Dask
Typical daily schedule
Time slot
Content
9:00-10:30
Slides, tutorial or exercises
10:30-10:45
Coffee Break
10:45-12:15
Slides, tutorial or exercises
12:15-13:30
Lunch (I know it’s a bit short)
13:30-15:15
Slides, tutorial or exercises (not nap)
15:15-15:30
Coffee Break (we may also make two breaks)
15:30-17:15
Slides, tutorial or exercises (last session, at last)
I’ll try to propose some quizz to be sure you’re following!
About myself
Guillaume
CNES (Centre National d’Etudes Spatiales - French Space Agency)
Since 2016:
6 years in CNES Computing Center team
1 year of holydays
1 year in developping image processing tools
6 years of using Dask/Python on HPC and in the Cloud
A bit of Kubernetes and Google Cloud
Before that: 5 years on Hadoop and Spark
Originally: Software developer (a lot of Java)
About others
Hugues
CNES (Centre National d’Etudes Spatiales - French Space Agency)
Since 2020:
Cloud specialist
Ground Segment Engineer
Before that:
System Architect
8 years as Software Enginner and Tech Lead (a lot of Java)
5 years as System & Network Technician
Emmanuelle
CNES (Centre National d’Etudes Spatiales - French Space Agency)
Since 2013:
6 years, HPC Expert
5 years Image processing, 3D
About yourselves
What are the previous courses you’ve followed in this master?
What are you familiar with in the big data, cloud, Python and machine
learning subjects?