pyinterp.Histogram2DFloat64#
- class pyinterp.Histogram2DFloat64(self, x: pyinterp.core.Axis, y: pyinterp.core.Axis, compression: int | None = None)#
Bases:
objectCreate a 2D histogram for binning continuous values into a grid using TDigest.
Groups continuous values into bins located on a 2D grid. Each bin maintains statistical distributions using the TDigest algorithm for efficient quantile estimation and statistical analysis.
The TDigest algorithm provides accurate quantile estimates (especially at the tails of the distribution) while using bounded memory. It’s particularly useful for:
Computing percentiles and quantiles
Identifying outliers
Analyzing large datasets that don’t fit in memory
Merging statistics from multiple datasets
- Parameters:
x – Definition of the bin centers for the X axis of the grid.
y – Definition of the bin centers for the Y axis of the grid.
compression – TDigest compression parameter (default: 100). Higher values provide better accuracy at the cost of memory usage. Typical values range from 100 to 1000.
dtype – Data type for internal storage, either ‘float32’ or ‘float64’. Determines precision and memory usage. Defaults to ‘float64’.
Examples
>>> import pyinterp >>> import numpy as np >>> x = pyinterp.Axis(np.arange(0, 360, 1), period=360) >>> y = pyinterp.Axis(np.arange(-90, 90, 1)) >>> hist = pyinterp.Histogram2D(x, y)
Add some sample data
>>> x_data = np.random.uniform(0, 360, 10000) >>> y_data = np.random.uniform(-90, 90, 10000) >>> z_data = np.random.normal(0, 1, 10000) >>> hist.push(x_data, y_data, z_data)
Compute statistics
>>> mean_grid = hist.mean() >>> median_grid = hist.quantile(0.5) >>> p95_grid = hist.quantile(0.95)
Create with custom compression for better accuracy
>>> hist_accurate = pyinterp.Histogram2D(x, y, compression=500)
Create with float32 for reduced memory usage
>>> hist_compact = pyinterp.Histogram2D(x, y, dtype='float32')
Notes
Unlike traditional histograms that bin values into discrete counts, Histogram2D uses TDigest to maintain a compressed representation of the distribution in each bin. This allows for:
Accurate quantile queries without storing all data points
Efficient merging of histograms from different datasets
Bounded memory usage regardless of the number of samples
TDigest provides better accuracy at the tails of the distribution (near 0th and 100th percentiles) where traditional histograms struggle.
Attributes
Public Methods
clear(self)Reset all statistics and clear all bins.
count(self)Compute the count of points within each bin.
max(self)Compute the maximum value in each bin.
mean(self)Compute the mean value in each bin.
min(self)Compute the minimum value in each bin.
push(self, x[, shape, writable, shape, ...])Push new samples into the histogram bins.
quantile(self, q)Compute the specified quantile for each bin.
sum_of_weights(self)Compute the sum of weights in each bin.
Special Methods
__copy__(self)Implement the shallow copy operation.
__getstate__(self)Get the state of the instance for pickling.
__iadd__(self, other)Merge another histogram into this one.
__new__(*args, **kwargs)__setstate__(self, state)Set the state of the instance from pickling.