TreeStore

A hierarchical, tree‑like container to organize compressed arrays with Blosc2.

Overview

TreeStore builds on top of DictStore by enforcing a strict hierarchical key structure and by providing helpers to navigate the hierarchy. Keys are POSIX‑like paths that must start with a leading slash (e.g. "/child0/child/leaf"). Data is stored only at leaf nodes; intermediate path segments are considered structural nodes and are created implicitly as you assign arrays to leaves.

Like DictStore, TreeStore supports two on‑disk representations:

  • .b2d: a directory layout (B2DIR) where external arrays are regular .b2nd files and a small embedded store (embed.b2e) holds small/in‑memory arrays.

  • .b2z: a single zip file (B2ZIP) that mirrors the above directory structure. You can create it directly or convert from a .b2d layout.

Small arrays (below a size threshold) and in‑memory objects go to the embedded store, while larger arrays or explicitly external arrays are stored as separate .b2nd files. You can traverse your dataset hierarchically with walk(), query children/descendants, or focus on a subtree view with get_subtree().

Quick example

import numpy as np
import blosc2

# Create a hierarchical store backed by a zip file
with blosc2.TreeStore("my_tree.b2z", mode="w") as tstore:
    # Data is stored at leaves; structural nodes are created implicitly
    tstore["/child0/leaf1"] = np.array([1, 2, 3])
    tstore["/child0/child1/leaf2"] = np.array([4, 5, 6])
    tstore["/child0/child2"] = np.array([7, 8, 9])

    # Inspect hierarchy
    for path, children, nodes in tstore.walk("/child0"):
        print(path, sorted(children), sorted(nodes))

    # Work with a subtree view rooted at /child0
    subtree = tstore.get_subtree("/child0")
    print(sorted(subtree.keys()))  # ['/child1/leaf2', '/child2', '/leaf1']
    print(subtree["/child1/leaf2"][:])  # [4 5 6]
class blosc2.TreeStore(*args, _from_parent_store=None, **kwargs)[source]

A hierarchical tree-based storage container for compressed data using Blosc2.

Extends DictStore with strict hierarchical key validation and tree traversal capabilities. Keys must follow a hierarchical structure using ‘/’ as separator and always start with ‘/’.

Parameters:
  • localpath (str) – Local path for the directory (.b2d) or file (.b2z); other extensions are not supported. If a directory is specified, it will be treated as a Blosc2 directory format (B2DIR). If a file is specified, it will be treated as a Blosc2 zip format (B2ZIP).

  • mode (str, optional) – File mode (‘r’, ‘w’, ‘a’). Default is ‘a’.

  • tmpdir (str or None, optional) – Temporary directory to use when working with .b2z files. If None, a system temporary directory will be managed. Default is None.

  • cparams (dict or None, optional) – Compression parameters for the internal embed store. If None, the default Blosc2 parameters are used.

  • dparams (dict or None, optional) – Decompression parameters for the internal embed store. If None, the default Blosc2 parameters are used.

  • storage (blosc2.Storage or None, optional) – Storage properties for the internal embed store. If None, the default Blosc2 storage properties are used.

  • threshold (int, optional) – Threshold for the array size (bytes) to be kept in the embed store. If the compressed array size is below this threshold, it will be stored in the embed store instead of as a separate file. If None, in-memory arrays are stored in the embed store and on-disk arrays are stored as separate files. C2Array objects will always be stored in the embed store, regardless of their size.

Examples

>>> tstore = TreeStore(localpath="my_tstore.b2z", mode="w")
>>> # Create a hierarchy. Data is stored in leaf nodes.
>>> # Structural nodes like /child0 and /child0/child1 are created automatically.
>>> tstore["/child0/leaf1"] = np.array([1, 2, 3])
>>> tstore["/child0/child1/leaf2"] = np.array([4, 5, 6])
>>> tstore["/child0/child2"] = np.array([7, 8, 9])
>>>
>>> # Walk the tree structure
>>> for path, children, nodes in tstore.walk("/child0"):
...     print(f"Path: {path}, Children: {sorted(children)}, Nodes: {sorted(nodes)}")
Path: /child0, Children: ['/child0/child1'], Nodes: ['/child0/child2', '/child0/leaf1']
Path: /child0/child1, Children: [], Nodes: ['/child0/child1/leaf2']
>>>
>>> # Get a subtree view
>>> subtree = tstore.get_subtree("/child0")
>>> sorted(list(subtree.keys()))
['/child1/leaf2', '/child2', '/leaf1']

Notes

The TreeStore is still experimental and subject to change. Please report any issues you may find.

Attributes:
estore

Access the underlying EmbedStore.

vlmeta

Access variable-length metadata for the TreeStore.

Methods

close()

Persist changes and cleanup.

get(key[, default])

Retrieve a node, or default if not found.

get_children(path)

Get direct children of a given path.

get_descendants(path)

Get all descendants of a given path.

get_subtree(path)

Create a subtree view with the specified path as root.

items()

Return key-value pairs in the current subtree view.

keys()

Return all keys in the current subtree view.

to_b2z([overwrite, filename])

Serialize zip store contents to the b2z file.

values()

Iterate over all values.

walk([path, topdown])

Walk the tree structure.

Special Methods:

__init__(*args[, _from_parent_store])

Initialize TreeStore with subtree support.

__getitem__(key)

Retrieve a node or subtree view.

__setitem__(key, value)

Add a node with hierarchical key validation.

__delitem__(key)

Remove a node or subtree.

__contains__(key)

Check if a key exists.

__len__()

Return number of nodes.

__iter__()

Iterate over keys.

Constructors

__init__(*args, _from_parent_store=None, **kwargs)[source]

Initialize TreeStore with subtree support.

Dictionary Interface

__getitem__(key: str) NDArray | C2Array | SChunk | TreeStore[source]

Retrieve a node or subtree view.

If the key points to a subtree (intermediate path with children), returns a TreeStore view of that subtree. If the key points to a final node (leaf), returns the stored array or schunk.

Parameters:

key (str) – Hierarchical node key.

Returns:

out – The stored array/chunk if key is a leaf node, or a TreeStore subtree view if key is an intermediate path with children.

Return type:

blosc2.NDArray or blosc2.C2Array or blosc2.SChunk or TreeStore

Raises:
  • KeyError – If key is not found.

  • ValueError – If key doesn’t follow hierarchical structure rules.

__setitem__(key: str, value: ndarray | NDArray | C2Array | SChunk) None[source]

Add a node with hierarchical key validation.

Parameters:
Raises:

ValueError – If key doesn’t follow hierarchical structure rules, if trying to assign to a structural path that already has children, or if trying to add a child to a path that already contains data.

__delitem__(key: str) None[source]

Remove a node or subtree.

If the key points to a subtree (intermediate path with children), removes all nodes in that subtree recursively. If the key points to a final node (leaf), removes only that node.

Parameters:

key (str) – Hierarchical node key.

Raises:
  • KeyError – If key is not found.

  • ValueError – If key doesn’t follow hierarchical structure rules.

__contains__(key: str) bool[source]

Check if a key exists.

Parameters:

key (str) – Hierarchical node key.

Returns:

exists – True if key exists, False otherwise.

Return type:

bool

__len__() int[source]

Return number of nodes.

__iter__() Iterator[str][source]

Iterate over keys.

keys()[source]

Return all keys in the current subtree view.

values() Iterator[NDArray | SChunk | C2Array][source]

Iterate over all values.

items() Iterator[tuple[str, NDArray | C2Array | SChunk | TreeStore]][source]

Return key-value pairs in the current subtree view.

Tree Navigation

get_children(path: str) list[str][source]

Get direct children of a given path.

Parameters:

path (str) – The parent path to get children for.

Returns:

children – List of direct child paths.

Return type:

list[str]

get_descendants(path: str) list[str][source]

Get all descendants of a given path.

Parameters:

path (str) – The parent path to get descendants for.

Returns:

descendants – List of all descendant paths.

Return type:

list[str]

walk(path: str = '/', topdown: bool = True) Iterator[tuple[str, list[str], list[str]]][source]

Walk the tree structure.

Similar to os.walk(), this visits all structural nodes in the hierarchy, yielding information about each level. Returns relative names, not full paths.

Parameters:
  • path (str, optional) – The root path to start walking from. Default is “/”.

  • topdown (bool, optional) – If True (default), traverse top-down (yield parent before children). If False, traverse bottom-up (yield children before parent), mimicking os.walk(topdown=False).

Yields:
  • path (str) – Current path being walked.

  • children (list[str]) – List of child directory names (structural nodes that have descendants). These are just the names, not full paths.

  • nodes (list[str]) – List of leaf node names (nodes that contain data). These are just the names, not full paths.

Examples

>>> for path, children, nodes in tstore.walk("/child0", topdown=True):
...     print(f"Path: {path}, Children: {children}, Nodes: {nodes}")
get_subtree(path: str) TreeStore[source]

Create a subtree view with the specified path as root.

Parameters:

path (str) – The path that will become the root of the subtree view (relative to current subtree).

Returns:

subtree – A new TreeStore instance that presents the subtree as if path were the root.

Return type:

TreeStore

Examples

>>> tstore["/child0/child1/data"] = np.array([1, 2, 3])
>>> tstore["/child0/child1/grandchild"] = np.array([4, 5, 6])
>>> subtree = tstore.get_subtree("/child0/child1")
>>> list(subtree.keys())
['/data', '/grandchild']
>>> subtree["/grandchild"][:]
array([4, 5, 6])

Notes

This is equivalent to tstore[path] when path is a structural path.

Properties

vlmeta

Access variable-length metadata for the TreeStore.

Returns a proxy to the vlmeta attribute of an internal SChunk stored at ‘/__vlmeta__’. The SChunk is created on-demand if it doesn’t exist.

Notes

The metadata is stored as vlmeta of an internal SChunk, ensuring robust serialization and persistence. This mirrors SChunk.vlmeta behavior, with additional guarantees: - Bulk get via [:] always returns a dict with string keys and decoded values. - Read-only protection is enforced at the TreeStore level.

Public Members

close() None[source]

Persist changes and cleanup.

get(key: str, default: Any = None) NDArray | SChunk | C2Array | Any[source]

Retrieve a node, or default if not found.

get_children(path: str) list[str][source]

Get direct children of a given path.

Parameters:

path (str) – The parent path to get children for.

Returns:

children – List of direct child paths.

Return type:

list[str]

get_descendants(path: str) list[str][source]

Get all descendants of a given path.

Parameters:

path (str) – The parent path to get descendants for.

Returns:

descendants – List of all descendant paths.

Return type:

list[str]

get_subtree(path: str) TreeStore[source]

Create a subtree view with the specified path as root.

Parameters:

path (str) – The path that will become the root of the subtree view (relative to current subtree).

Returns:

subtree – A new TreeStore instance that presents the subtree as if path were the root.

Return type:

TreeStore

Examples

>>> tstore["/child0/child1/data"] = np.array([1, 2, 3])
>>> tstore["/child0/child1/grandchild"] = np.array([4, 5, 6])
>>> subtree = tstore.get_subtree("/child0/child1")
>>> list(subtree.keys())
['/data', '/grandchild']
>>> subtree["/grandchild"][:]
array([4, 5, 6])

Notes

This is equivalent to tstore[path] when path is a structural path.

items() Iterator[tuple[str, NDArray | C2Array | SChunk | TreeStore]][source]

Return key-value pairs in the current subtree view.

keys()[source]

Return all keys in the current subtree view.

to_b2z(overwrite=False, filename=None) PathLike[Any] | str[source]

Serialize zip store contents to the b2z file.

Parameters:
  • overwrite (bool, optional) – If True, overwrite the existing b2z file if it exists. Default is False.

  • filename (str, optional) – If provided, use this filename instead of the default b2z file path.

Returns:

filename – The absolute path to the created b2z file.

Return type:

str

values() Iterator[NDArray | SChunk | C2Array][source]

Iterate over all values.

walk(path: str = '/', topdown: bool = True) Iterator[tuple[str, list[str], list[str]]][source]

Walk the tree structure.

Similar to os.walk(), this visits all structural nodes in the hierarchy, yielding information about each level. Returns relative names, not full paths.

Parameters:
  • path (str, optional) – The root path to start walking from. Default is “/”.

  • topdown (bool, optional) – If True (default), traverse top-down (yield parent before children). If False, traverse bottom-up (yield children before parent), mimicking os.walk(topdown=False).

Yields:
  • path (str) – Current path being walked.

  • children (list[str]) – List of child directory names (structural nodes that have descendants). These are just the names, not full paths.

  • nodes (list[str]) – List of leaf node names (nodes that contain data). These are just the names, not full paths.

Examples

>>> for path, children, nodes in tstore.walk("/child0", topdown=True):
...     print(f"Path: {path}, Children: {children}, Nodes: {nodes}")
property estore: EmbedStore

Access the underlying EmbedStore.

property vlmeta: MutableMapping | None

Access variable-length metadata for the TreeStore.

Returns a proxy to the vlmeta attribute of an internal SChunk stored at ‘/__vlmeta__’. The SChunk is created on-demand if it doesn’t exist.

Notes

The metadata is stored as vlmeta of an internal SChunk, ensuring robust serialization and persistence. This mirrors SChunk.vlmeta behavior, with additional guarantees: - Bulk get via [:] always returns a dict with string keys and decoded values. - Read-only protection is enforced at the TreeStore level.

Notes

  • Keys must start with /. The root is /. Empty path segments (//) are not allowed.

  • Leaf nodes hold the actual data (NumPy arrays, NDArray, C2Array). Structural nodes exist implicitly to organize leaves and are not directly assigned any data.

  • For storage/embedding thresholds and external arrays behavior, see also DictStore which TreeStore extends.