Guillaume Eynard-Bontemps, Hugues Larat, CNES (Centre National d’Etudes Spatiales - French Space Agency)
2024-08-01
Object storage is a computer data storage that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. (Wikipedia)
But Why?
What makes object storage efficient? (multiple choices)
Answer link Key: mz
# Needs a keys file in your $HOME dir, or to dynamically obtain a key
aws s3 cp /tmp/foo/ s3://bucket/ --recursive --exclude "*" --include "*.jpg"
Interfaces and libraries for major programming languages:
All Major Data processing framework are compatible with S3 like interfaces.
Just replace / or file:// by s3://.
import dask.dataframe as dd
df = dd.read_csv('s3://bucket/path/to/data-*.csv')
df = dd.read_parquet('gcs://bucket/path/to/data-*.parq')
More on the next part of the course.
Analysis Ready Cloud Optimized Data.
Thanks to Ryan Abernathey.
What is Analysis Ready?
Analysis Ready Cloud Optimized Data.
Thanks to Ryan Abernathey.
What is Cloud Optimized?
Object storage is always performant for scientific data processing.
Answer link Key: qp