pyinterp.TDigest#
- pyinterp.TDigest(values: object, weights: object | None = None, axis: collections.abc.Sequence[int] | None = None, compression: int = 100, dtype: object | None = None) object#
T-Digest for incremental quantile estimation.
Computes quantiles using the t-digest algorithm which provides accurate estimates, especially at the tails of the distribution. Supports parallel and online computation with arbitrary weights.
Reference: Computing Extremely Accurate Quantiles Using t-Digests tdunning/t-digest
- Parameters:
values – Input array of values.
weights – Optional array of weights (same shape as values).
axis – Optional axis or axes along which to compute quantiles.
compression – Compression parameter controlling accuracy vs memory tradeoff. Higher values provide better accuracy but use more memory. Typical values: 100-1000. Default is 100.
dtype – Data type for internal storage, either ‘float32’ or ‘float64’. Determines precision and memory usage. Defaults to ‘float64’.
Examples
>>> import numpy as np >>> import pyinterp
# Compute t-digest for a 1D array with float64 (default) >>> data = np.random.randn(10000) >>> tdigest = pyinterp.TDigest(data) >>> median = tdigest.quantile(0.5) >>> print(f”Median: {median}”)
# Compute t-digest with float32 for reduced memory usage >>> data = data.astype(‘float32’) >>> tdigest = pyinterp.TDigest(data, dtype=’float32’)
# Compute along a specific axis >>> data_2d = np.random.randn(100, 50) >>> tdigest_axis = pyinterp.TDigest(data_2d, axis=[0]) >>> medians = tdigest_axis.quantile(0.5) >>> print(f”Medians shape: {medians.shape}”)
# Compute with weights and higher compression for better accuracy >>> weights = np.random.rand(100, 50) >>> tdigest_weighted = pyinterp.TDigest(data_2d,
weights=weights, compression=500,
) >>> weighted_median = tdigest_weighted.quantile(0.5)