pyinterp.dask.tdigest#
- pyinterp.dask.tdigest(values, weights=None, axis=None, compression=100, *, dtype=None)[source]#
Compute quantile estimates on a dask array using T-Digest.
This function uses the T-Digest algorithm to compute approximate quantiles on a dask array by processing each block independently and then merging the results.
- Parameters:
values (dask.array.Array) – Input dask array of values.
weights (dask.array.Array | None) – Optional dask array of weights with the same shape as values.
axis (list[int] | None) – Axis or axes along which to compute quantiles. If None, quantiles are computed over all axes.
compression (int) – T-Digest compression parameter. Higher values give more accurate results but use more memory. Default is 100.
dtype (str | type | np.dtype | None) – Data type for computation. Can be “float32”, “float64”, np.float32, np.float64, or None (defaults to float64).
- Returns:
A TDigest instance that can be used to compute quantiles.
- Raises:
ImportError – If dask is not installed.
TypeError – If inputs are not dask arrays.
ValueError – If values and weights have different shapes.
- Return type:
core.TDigestHolder
Example
>>> import dask.array as da >>> import pyinterp.dask as dask_stats >>> values = da.random.random((10000,), chunks=1000) >>> digest = dask_stats.tdigest(values) >>> print(f"Median: {digest.quantile(0.5):.4f}") >>> print(f"Q25: {digest.quantile(0.25):.4f}") >>> print(f"Q75: {digest.quantile(0.75):.4f}")