Guillaume Eynard-Bontemps and Emmanuelle Sarrazin, CNES (Centre National d’Etudes Spatiales - French Space Agency)
2024-01
Python library for parallel and distributed computing
from dask.distributed import LocalCluster
client = LocalCluster().get_client()
# Submit work to happen in parallel
results = []
for filename in filenames:
data = client.submit(load, filename)
result = client.submit(process, data)
results.append(result)
# Gather results back to local computer
results = client.gather(results)
High level collections are used to generate task graphs
Create an array of ones
Create an 2d-array of ones and sum it
Add array to its transpose
Matrix multiplication
Use compute()
to execute the graph and get the
result
What Dask does better than Spark (multiple choices)?
Answer link Key: tw
Try to follow by order of importance:
or
Finish yesterday deployment (needed for tomorrow).