Note
Go to the end to download the full example code or to run this example in your browser via Binder.
Descriptive Statistics#
While NumPy provides a wide range of statistical functions, calculating
multiple statistical variables from the same array often requires multiple
passes over the data. The pyinterp.DescriptiveStatistics class
offers a more efficient solution by computing several statistical variables in
a single pass. This approach is not only faster but also more numerically
stable, thanks to its incremental calculation algorithm.
Note
This implementation is based on the following paper:
Pébay, P., Terriberry, T.B., Kolla, H. et al. Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights. Comput Stat 31, 1305-1325, 2016, https://doi.org/10.1007/s00180-015-0637-z
Basic Usage#
Let’s start by creating a random array and using it to initialize the
pyinterp.DescriptiveStatistics class.
import numpy
import pyinterp
generator = numpy.random.Generator(numpy.random.PCG64(0))
values = generator.random((2, 4, 6, 8))
ds = pyinterp.DescriptiveStatistics(values)
Once the object is created, you can access various statistical variables, such as the count, mean, variance, standard deviation, skewness, kurtosis, minimum, maximum, and sum.
print(f'Count: {ds.count()}')
print(f'Mean: {ds.mean()}')
print(f'Variance: {ds.variance()}')
Count: 384
Mean: 0.5270305088161521
Variance: 0.08762152302323273
Computing Statistics Along an Axis#
Similar to NumPy, you can compute statistics along a specific axis by
providing the axis parameter.
ds_axis = pyinterp.DescriptiveStatistics(values, axis=(1, 2))
print('Mean along axis (1, 2):')
print(ds_axis.mean())
Mean along axis (1, 2):
[[0.46094551 0.58985662 0.56335527 0.59960438 0.53755935 0.46486889
0.59151122 0.50117507]
[0.5223734 0.50646854 0.56639677 0.43645944 0.53861813 0.48949667
0.52384772 0.53995116]]
Weighted Statistics#
You can also calculate weighted statistics by providing a weights array.
weights = generator.random((2, 4, 6, 8))
ds_weighted = pyinterp.DescriptiveStatistics(values,
weights=weights,
axis=(1, 2))
print('Weighted mean:')
print(ds_weighted.mean())
Weighted mean:
[[0.44437477 0.59020081 0.52588079 0.62338788 0.52285862 0.44453644
0.63767415 0.4967181 ]
[0.51336212 0.46654736 0.6201174 0.45925923 0.53653586 0.4910487
0.5093799 0.55740906]]
Total running time of the script: (0 minutes 0.002 seconds)