.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/stats/ex_descriptive_statistics.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_stats_ex_descriptive_statistics.py: .. _example_descriptive_statistics: Descriptive Statistics ====================== While NumPy provides a wide range of statistical functions, calculating multiple statistical variables from the same array often requires multiple passes over the data. The :py:class:`pyinterp.DescriptiveStatistics` class offers a more efficient solution by computing several statistical variables in a single pass. This approach is not only faster but also more numerically stable, thanks to its incremental calculation algorithm. .. note:: This implementation is based on the following paper: Pébay, P., Terriberry, T.B., Kolla, H. et al. Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights. Comput Stat 31, 1305-1325, 2016, https://doi.org/10.1007/s00180-015-0637-z .. GENERATED FROM PYTHON SOURCE LINES 27-32 Basic Usage ----------- Let's start by creating a random array and using it to initialize the :py:class:`pyinterp.DescriptiveStatistics` class. .. GENERATED FROM PYTHON SOURCE LINES 32-42 .. code-block:: Python import dask.array import numpy import pyinterp generator = numpy.random.Generator(numpy.random.PCG64(0)) values = generator.random((2, 4, 6, 8)) ds = pyinterp.DescriptiveStatistics(values) .. GENERATED FROM PYTHON SOURCE LINES 43-46 Once the object is created, you can access various statistical variables, such as the count, mean, variance, standard deviation, skewness, kurtosis, minimum, maximum, and sum. .. GENERATED FROM PYTHON SOURCE LINES 46-50 .. code-block:: Python print(f'Count: {ds.count()}') print(f'Mean: {ds.mean()}') print(f'Variance: {ds.var()}') .. rst-class:: sphx-glr-script-out .. code-block:: none Count: [384] Mean: [0.52703051] Variance: [0.08762152] .. GENERATED FROM PYTHON SOURCE LINES 51-53 You can also get all the calculated statistical variables as a structured NumPy array. .. GENERATED FROM PYTHON SOURCE LINES 53-57 .. code-block:: Python stats_array = ds.array() print('Structured array of statistics:') print(stats_array) .. rst-class:: sphx-glr-script-out .. code-block:: none Structured array of statistics: [(384, -1.20728744, 0.99720994, 0.52703051, 0.00030069, -0.12457213, 384., 202.37971539, 0.08762152)] .. GENERATED FROM PYTHON SOURCE LINES 58-63 Computing Statistics Along an Axis ---------------------------------- Similar to NumPy, you can compute statistics along a specific axis by providing the ``axis`` parameter. .. GENERATED FROM PYTHON SOURCE LINES 63-67 .. code-block:: Python ds_axis = pyinterp.DescriptiveStatistics(values, axis=(1, 2)) print('Mean along axis (1, 2):') print(ds_axis.mean()) .. rst-class:: sphx-glr-script-out .. code-block:: none Mean along axis (1, 2): [[0.46094551 0.58985662 0.56335527 0.59960438 0.53755935 0.46486889 0.59151122 0.50117507] [0.5223734 0.50646854 0.56639677 0.43645944 0.53861813 0.48949667 0.52384772 0.53995116]] .. GENERATED FROM PYTHON SOURCE LINES 68-73 Working with Dask Arrays ------------------------ The :py:class:`pyinterp.DescriptiveStatistics` class also supports Dask arrays, allowing you to work with datasets that are larger than memory. .. GENERATED FROM PYTHON SOURCE LINES 73-78 .. code-block:: Python dask_values = dask.array.from_array(values, chunks=(2, 2, 2, 2)) ds_dask = pyinterp.DescriptiveStatistics(dask_values, axis=(1, 2)) print('Mean with Dask array:') print(ds_dask.mean()) .. rst-class:: sphx-glr-script-out .. code-block:: none Mean with Dask array: [[0.46094551 0.58985662 0.56335527 0.59960438 0.53755935 0.46486889 0.59151122 0.50117507] [0.5223734 0.50646854 0.56639677 0.43645944 0.53861813 0.48949667 0.52384772 0.53995116]] .. GENERATED FROM PYTHON SOURCE LINES 79-83 Weighted Statistics ------------------- You can also calculate weighted statistics by providing a ``weights`` array. .. GENERATED FROM PYTHON SOURCE LINES 83-89 .. code-block:: Python weights = generator.random((2, 4, 6, 8)) ds_weighted = pyinterp.DescriptiveStatistics(values, weights=weights, axis=(1, 2)) print('Weighted mean:') print(ds_weighted.mean()) .. rst-class:: sphx-glr-script-out .. code-block:: none Weighted mean: [[0.44437477 0.59020081 0.52588079 0.62338788 0.52285862 0.44453644 0.63767415 0.4967181 ] [0.51336212 0.46654736 0.6201174 0.45925923 0.53653586 0.4910487 0.5093799 0.55740906]] .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.476 seconds) .. _sphx_glr_download_auto_examples_stats_ex_descriptive_statistics.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/CNES/pangeo-pyinterp/master?urlpath=lab/tree/notebooks/auto_examples/stats/ex_descriptive_statistics.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: ex_descriptive_statistics.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: ex_descriptive_statistics.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: ex_descriptive_statistics.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_