pyinterp.dask.tdigest

Contents

pyinterp.dask.tdigest#

pyinterp.dask.tdigest(values, weights=None, axis=None, compression=100, *, dtype=None)[source]#

Compute quantile estimates on a dask array using T-Digest.

This function uses the T-Digest algorithm to compute approximate quantiles on a dask array by processing each block independently and then merging the results.

Parameters:
  • values (dask.array.Array) – Input dask array of values.

  • weights (dask.array.Array | None) – Optional dask array of weights with the same shape as values.

  • axis (list[int] | None) – Axis or axes along which to compute quantiles. If None, quantiles are computed over all axes.

  • compression (int) – T-Digest compression parameter. Higher values give more accurate results but use more memory. Default is 100.

  • dtype (str | type | np.dtype | None) – Data type for computation. Can be “float32”, “float64”, np.float32, np.float64, or None (defaults to float64).

Returns:

A TDigest instance that can be used to compute quantiles.

Raises:
Return type:

core.TDigestHolder

Example

>>> import dask.array as da
>>> import pyinterp.dask as dask_stats
>>> values = da.random.random((10000,), chunks=1000)
>>> digest = dask_stats.tdigest(values)
>>> print(f"Median: {digest.quantile(0.5):.4f}")
>>> print(f"Q25: {digest.quantile(0.25):.4f}")
>>> print(f"Q75: {digest.quantile(0.75):.4f}")