Guillaume Eynard-Bontemps and Emmanuelle Sarrazin, CNES (Centre National d’Etudes Spatiales - French Space Agency)
2025-02
3 kinds of bottlenecks in an algorithm:
In addition to paying attention to the resources consumed in terms of memory and computing time, it is also necessary to take into account:
Granularity of a task measures the amount of work (or computation) which is performed by that task
Fine-grained | Coarse-grained | |
---|---|---|
Pros | Possibility to use a lot of ressources | Low communication and synchronization overhead |
Cons | Increases the communication and synchronization overhead | Risks of load imbalance |
A system is scalable if adding a resource reduces the computation time. You want to be able to reduce the runtime by a factor N when using N nodes.
Processes | Threads | |
---|---|---|
Pros | Little coordination or synchronization | Lighter, cheaper |
Cons | High cost creation | Race condition, data corruption because of shared memory access |
Distributed computing
Exchanging objects between processes: Data serialization
Exchanging objects between processes:
with SharedMemoryManager() as smm:
sl = smm.ShareableList(range(2000))
# Divide the work among two processes, storing partial results in sl
p1 = Process(target=do_work, args=(sl, 0, 1000))
p2 = Process(target=do_work, args=(sl, 1000, 2000))
p1.start()
p2.start() # A multiprocessing.Pool might be more efficient
p1.join()
p2.join() # Wait for all work to complete in both processes
total_result = sum(sl) # Consolidate the partial results now in sl
Synchronization between processes using Lock
from concurrent.futures import ProcessPoolExecutor as PoolExecutor
from functools import partial
def do_work(sleep_secs: float, i: int) -> str:
time.sleep(sleep_secs)
return f"foo-{i}"
if __name__ == "__main__":
start_time = time.time()
with PoolExecutor() as executor:
results_gen = executor.map(partial(do_work, 3.0), range(1, 10))
Numba makes Python code fast
from numba import jit
import numpy as np
import time
x = np.arange(100).reshape(10, 10)
@jit(nopython=True)
def go_fast(a): # Function is compiled and runs in machine code
trace = 0.0
for i in range(a.shape[0]):
trace += np.tanh(a[i, i])
return a + trace
> Elapsed (with compilation) = 0.33030009269714355s
> Elapsed (after compilation) = 6.67572021484375e-06s
parallel = True
: enable the automatic parallelization
of the function.fastmath = True
: enable fast-math behaviour for the
function.Population based training of neural networks (Deepmind)