pyinterp.TDigest

Contents

pyinterp.TDigest#

pyinterp.TDigest(values: object, weights: object | None = None, axis: collections.abc.Sequence[int] | None = None, compression: int = 100, dtype: object | None = None) object#

T-Digest for incremental quantile estimation.

Computes quantiles using the t-digest algorithm which provides accurate estimates, especially at the tails of the distribution. Supports parallel and online computation with arbitrary weights.

Reference: Computing Extremely Accurate Quantiles Using t-Digests tdunning/t-digest

Parameters:
  • values – Input array of values.

  • weights – Optional array of weights (same shape as values).

  • axis – Optional axis or axes along which to compute quantiles.

  • compression – Compression parameter controlling accuracy vs memory tradeoff. Higher values provide better accuracy but use more memory. Typical values: 100-1000. Default is 100.

  • dtype – Data type for internal storage, either ‘float32’ or ‘float64’. Determines precision and memory usage. Defaults to ‘float64’.

Examples

>>> import numpy as np
>>> import pyinterp

# Compute t-digest for a 1D array with float64 (default) >>> data = np.random.randn(10000) >>> tdigest = pyinterp.TDigest(data) >>> median = tdigest.quantile(0.5) >>> print(f”Median: {median}”)

# Compute t-digest with float32 for reduced memory usage >>> data = data.astype(‘float32’) >>> tdigest = pyinterp.TDigest(data, dtype=’float32’)

# Compute along a specific axis >>> data_2d = np.random.randn(100, 50) >>> tdigest_axis = pyinterp.TDigest(data_2d, axis=[0]) >>> medians = tdigest_axis.quantile(0.5) >>> print(f”Medians shape: {medians.shape}”)

# Compute with weights and higher compression for better accuracy >>> weights = np.random.rand(100, 50) >>> tdigest_weighted = pyinterp.TDigest(data_2d,

weights=weights, compression=500,

) >>> weighted_median = tdigest_weighted.quantile(0.5)