
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/ex_walkthrough.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_ex_walkthrough.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_ex_walkthrough.py:


End-to-end Walkthrough
======================

Builds a float32 dataset on a ``LocalStore``, partitions it, reopens the
collection from disk, queries with a filter, and asserts bit-exact equality.

Run with::

    python examples/ex_walkthrough.py

.. GENERATED FROM PYTHON SOURCE LINES 12-22

.. code-block:: Python


    from pathlib import Path
    import shutil
    import tempfile

    import numpy

    import zcollection as zc









.. GENERATED FROM PYTHON SOURCE LINES 23-27

Initialization
--------------

Set up a temporary directory for the collection.

.. GENERATED FROM PYTHON SOURCE LINES 27-31

.. code-block:: Python

    target = Path(tempfile.gettempdir()) / "zc-walkthrough"
    if target.exists():
        shutil.rmtree(target)








.. GENERATED FROM PYTHON SOURCE LINES 32-36

Build a schema
--------------

Declare dimensions and variables with their dtypes and chunk sizes.

.. GENERATED FROM PYTHON SOURCE LINES 36-51

.. code-block:: Python

    schema = (
        zc.Schema()
        .with_dimension("time", chunks=4096)
        .with_dimension("x_ac", size=240, chunks=240)
        .with_variable("time", dtype="int64", dimensions=("time",))
        .with_variable("partition", dtype="int64", dimensions=("time",))
        .with_variable(
            "ssh",
            dtype="float32",
            dimensions=("time", "x_ac"),
            fill_value=numpy.float32("nan"),
        )
        .build()
    )








.. GENERATED FROM PYTHON SOURCE LINES 52-57

Build a sample dataset
----------------------

Create a :py:class:`~zcollection.Dataset` with synthetic data split across
4 partitions.

.. GENERATED FROM PYTHON SOURCE LINES 57-77

.. code-block:: Python

    N_PARTITIONS = 4
    ROWS_PER_PARTITION = 25_000
    rng = numpy.random.default_rng(42)
    n = N_PARTITIONS * ROWS_PER_PARTITION
    time = numpy.arange(n, dtype="int64")
    partition = numpy.repeat(
        numpy.arange(N_PARTITIONS, dtype="int64"), ROWS_PER_PARTITION
    )
    ssh = rng.standard_normal(size=(n, 240), dtype="float32")

    ds = zc.Dataset(
        schema=schema,
        variables={
            "time": zc.Variable(schema.variables["time"], time),
            "partition": zc.Variable(schema.variables["partition"], partition),
            "ssh": zc.Variable(schema.variables["ssh"], ssh),
        },
    )
    print(f"dataset: {ds}  ({ds['ssh'].to_numpy().nbytes / 1e6:.1f} MB ssh)")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    dataset: <zcollection.data.dataset.Dataset '/'> Size: 93.08 MB
      Dimensions: (time: 100000, x_ac: 240)
    Data variables:
        time      (time)                   int64       781.25 kB  numpy.ndarray<size=781.25 kB>
        partition (time)                   int64       781.25 kB  numpy.ndarray<size=781.25 kB>
        ssh       (time, x_ac)             float32      91.55 MB  numpy.ndarray<size=91.55 MB>  (96.0 MB ssh)




.. GENERATED FROM PYTHON SOURCE LINES 78-85

Create the collection
---------------------

:func:`~zcollection.create_collection` writes the schema to disk and returns
a writable :py:class:`~zcollection.Collection`. The
:py:class:`~zcollection.partitioning.Sequence` partitioner splits rows by
the ``partition`` variable.

.. GENERATED FROM PYTHON SOURCE LINES 85-92

.. code-block:: Python

    collection = zc.create_collection(
        f"file://{target}",
        schema=schema,
        axis="time",
        partitioning=zc.partitioning.Sequence(("partition",), dimension="time"),
    )








.. GENERATED FROM PYTHON SOURCE LINES 93-97

Insert data
-----------

Rows are automatically routed to the correct partition on disk.

.. GENERATED FROM PYTHON SOURCE LINES 97-100

.. code-block:: Python

    written = collection.insert(ds)
    print(f"wrote {len(written)} partitions: {written}")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    wrote 4 partitions: ['partition=0', 'partition=1', 'partition=2', 'partition=3']




.. GENERATED FROM PYTHON SOURCE LINES 101-105

Reopen and query
----------------

Reopen read-only and load the full dataset back; assert bit-exact equality.

.. GENERATED FROM PYTHON SOURCE LINES 105-113

.. code-block:: Python

    reopened = zc.open_collection(f"file://{target}", mode="r")
    print(f"reopened: axis={reopened.axis} parts={list(reopened.partitions())}")

    full = reopened.query()
    assert numpy.array_equal(full["time"].to_numpy(), ds["time"].to_numpy())
    assert numpy.array_equal(full["ssh"].to_numpy(), ds["ssh"].to_numpy())
    print("bit-exact round-trip: OK")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    reopened: axis=time parts=['partition=0', 'partition=1', 'partition=2', 'partition=3']
    bit-exact round-trip: OK




.. GENERATED FROM PYTHON SOURCE LINES 114-119

Filter pushdown
---------------

Filters are evaluated against partition keys; only matching partitions are
read from disk.

.. GENERATED FROM PYTHON SOURCE LINES 119-122

.. code-block:: Python

    sub = reopened.query(filters="partition == 2")
    assert sub["partition"].to_numpy().tolist() == [2] * ROWS_PER_PARTITION
    print("filter pushdown: OK")




.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    filter pushdown: OK





.. _sphx_glr_download_auto_examples_ex_walkthrough.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: ex_walkthrough.ipynb <ex_walkthrough.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: ex_walkthrough.py <ex_walkthrough.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: ex_walkthrough.zip <ex_walkthrough.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
