TreeStore

A hierarchical, tree‑like container to organize compressed arrays with Blosc2.

Overview

TreeStore builds on top of DictStore by enforcing a strict hierarchical key structure and by providing helpers to navigate the hierarchy. Keys are POSIX‑like paths that must start with a leading slash (e.g. "/child0/child/leaf"). Data is stored only at leaf nodes; intermediate path segments are considered structural nodes and are created implicitly as you assign arrays to leaves.

Like DictStore, TreeStore supports two on‑disk representations:

  • .b2d: a directory layout (B2DIR) where external arrays are regular .b2nd files and a small embedded store (embed.b2e) holds small/in‑memory arrays.

  • .b2z: a single zip file (B2ZIP) that mirrors the above directory structure. You can create it directly or convert from a .b2d layout.

Small arrays (below a size threshold) and in‑memory objects go to the embedded store, while larger arrays or explicitly external arrays are stored as separate .b2nd files. You can traverse your dataset hierarchically with walk(), query children/descendants, or focus on a subtree view with get_subtree().

TreeStore also supports read-only memory-mapped opens via mmap_mode="r" (constructor or blosc2.open()) for both .b2d and .b2z formats.

Quick example

import numpy as np
import blosc2

# Create a hierarchical store backed by a zip file
with blosc2.TreeStore("my_tree.b2z", mode="w") as tstore:
    # Data is stored at leaves; structural nodes are created implicitly
    tstore["/child0/leaf1"] = np.array([1, 2, 3])
    tstore["/child0/child1/leaf2"] = np.array([4, 5, 6])
    tstore["/child0/child2"] = np.array([7, 8, 9])

    # Inspect hierarchy
    for path, children, nodes in tstore.walk("/child0"):
        print(path, sorted(children), sorted(nodes))

    # Work with a subtree view rooted at /child0
    subtree = tstore.get_subtree("/child0")
    print(sorted(subtree.keys()))  # ['/child1/leaf2', '/child2', '/leaf1']
    print(subtree["/child1/leaf2"][:])  # [4 5 6]

# Reopen using blosc2.open
with blosc2.open("my_tree.b2z", mode="r") as tstore:
    print(sorted(tstore.keys()))

# Reopen in read-only mmap mode
with blosc2.open("my_tree.b2z", mode="r", mmap_mode="r") as tstore_mmap:
    print(tstore_mmap["/child0/leaf1"][0:2])

Note

For store containers, only mmap_mode="r" is currently supported, and it requires mode="r".

class blosc2.TreeStore(*args, _from_parent_store=None, **kwargs)[source]

A hierarchical tree-based storage container for Blosc2 data.

Extends blosc2.DictStore with strict hierarchical key validation and tree traversal capabilities. Keys must follow a hierarchical structure using ‘/’ as separator and always start with ‘/’. If user passes a key that doesn’t start with ‘/’, it will be automatically added.

It supports the same arguments as blosc2.DictStore.

Parameters:
  • localpath (str) – Local path for the directory-backed store or compact zip-backed file. A .b2z suffix selects the zip-backed format. Existing directories, and new paths not ending in .b2z, use Blosc2 directory format (B2DIR); a .b2d suffix is recommended for these directory-backed stores. Existing files are treated as Blosc2 zip format (B2ZIP).

  • mode (str, optional) – File mode (‘r’, ‘w’, ‘a’). Default is ‘a’.

  • tmpdir (str or None, optional) – Temporary directory to use when working with .b2z files. If None, a temporary directory is created in the same directory as the .b2z file, so that unpacked data stays on the same filesystem. Default is None.

  • cparams (dict or None, optional) – Compression parameters for the internal embed store. If None, the default Blosc2 parameters are used.

  • dparams (dict or None, optional) – Decompression parameters for the internal embed store. If None, the default Blosc2 parameters are used.

  • storage (blosc2.Storage or None, optional) – Storage properties for the internal embed store. If None, the default Blosc2 storage properties are used.

  • threshold (int, optional) – Threshold for the array size (bytes) to be kept in the embed store. If the compressed array size is below this threshold, it will be stored in the embed store instead of as a separate file. If None, in-memory arrays are stored in the embed store and on-disk arrays are stored as separate files. C2Array objects will always be stored in the embed store, regardless of their size.

Examples

Store plain arrays in a hierarchy:

>>> tstore = TreeStore(localpath="my_tstore.b2z", mode="w")
>>> # Data lives in leaf nodes; structural nodes are created automatically.
>>> tstore["/child0/leaf1"] = np.array([1, 2, 3])
>>> tstore["/child0/child1/leaf2"] = np.array([4, 5, 6])
>>> tstore["/child0/child2"] = np.array([7, 8, 9])
>>>
>>> # Walk the tree structure
>>> for path, children, nodes in tstore.walk("/child0"):
...     print(f"Path: {path}, Children: {sorted(children)}, Nodes: {sorted(nodes)}")
Path: /child0, Children: ['/child0/child1'], Nodes: ['/child0/child2', '/child0/leaf1']
Path: /child0/child1, Children: [], Nodes: ['/child0/child1/leaf2']
>>>
>>> # Get a subtree view
>>> subtree = tstore.get_subtree("/child0")
>>> sorted(list(subtree.keys()))
['/child1/leaf2', '/child2', '/leaf1']

Mix NDArrays and CTables in the same bundle:

>>> import dataclasses
>>> @dataclasses.dataclass
... class Row:
...     x: int = 0
...     y: float = 0.0
>>> table = blosc2.CTable(Row)
>>> _ = table.append(Row(x=1, y=1.5))
>>> _ = table.append(Row(x=2, y=3.0))
>>> with blosc2.TreeStore("bundle.b2z", mode="w") as ts:
...     ts["/data/array"] = blosc2.arange(5)
...     ts["/data/table"] = table
>>> with blosc2.open("bundle.b2z", mode="r") as ts:
...     print(sorted(ts.keys()))
...     arr = ts["/data/array"]
...     tbl = ts["/data/table"]
...     print(type(tbl).__name__, len(tbl))
['/data', '/data/array', '/data/table']
CTable 2
Attributes:
estore

Access the underlying EmbedStore.

vlmeta

Access variable-length metadata for the TreeStore or current subtree.

Methods

close()

Flush inline object handles then delegate to DictStore.close().

discard()

Discard without repacking; also discard inline handle storage.

get(key[, default])

Retrieve a node, or default if not found.

get_children(path)

Get direct children of a given path.

get_descendants(path)

Get all descendants of a given path.

get_subtree(path)

Create a subtree view with the specified path as root.

items()

Return key-value pairs in the current subtree view.

keys()

Return all keys in the current subtree view.

to_b2d([dirname, overwrite])

Serialize store contents to a directory-backed store.

to_b2z([overwrite, filename])

Serialize store contents to a compact .b2z file.

values()

Return values in the current subtree view, with object roots collapsed.

walk([path, topdown])

Walk the tree structure.

Special Methods:

__init__(*args[, _from_parent_store])

Initialize TreeStore with subtree support.

__getitem__(key)

Retrieve a node, object, or subtree view.

__setitem__(key, value)

Add a node with hierarchical key validation.

__delitem__(key)

Remove a node, object root, or subtree.

__contains__(key)

Check if a key exists (includes object roots, excludes object internals).

__len__()

Return number of nodes.

__iter__()

Iterate over keys, excluding vlmeta keys.

Constructors

__init__(*args, _from_parent_store=None, **kwargs)[source]

Initialize TreeStore with subtree support.

It supports the same arguments as blosc2.DictStore.

Dictionary Interface

__getitem__(key: str) NDArray | C2Array | SChunk | blosc2.ObjectArray | blosc2.BatchArray | blosc2.CTable | TreeStore[source]

Retrieve a node, object, or subtree view.

If the key is a registered object root (e.g. CTable) returns that object. If the key is a structural intermediate path returns a subtree view. If the key is a leaf returns the stored array/schunk.

Examples

>>> import dataclasses
>>> @dataclasses.dataclass
... class Row:
...     x: int = 0
>>> t = blosc2.CTable(Row)
>>> _ = t.append(Row(x=42))
>>> with blosc2.TreeStore("store.b2z", mode="w") as ts:
...     ts["/arr"] = blosc2.zeros(3, dtype="i4")
...     ts["/group/val"] = blosc2.ones(2, dtype="f4")
...     ts["/table"] = t
>>> with blosc2.open("store.b2z", mode="r") as ts:
...     arr = ts["/arr"]            # NDArray leaf
...     sub = ts["/group"]           # TreeStore subtree view
...     tbl = ts["/table"]           # CTable object
...     print(type(arr).__name__, type(sub).__name__, type(tbl).__name__)
NDArray TreeStore CTable
__setitem__(key: str, value: blosc2.Array | SChunk | blosc2.ObjectArray | blosc2.BatchArray | blosc2.CTable) None[source]

Add a node with hierarchical key validation.

Parameters:
Raises:

ValueError – If key doesn’t follow hierarchical structure rules, if trying to assign to a structural path that already has children, or if trying to add a child to a path that already contains data.

Examples

Store an NDArray and a CTable together:

>>> import dataclasses
>>> @dataclasses.dataclass
... class Row:
...     x: int = 0
>>> t = blosc2.CTable(Row)
>>> _ = t.append(Row(x=10))
>>> with blosc2.TreeStore("store.b2z", mode="w") as ts:
...     ts["/arr"] = blosc2.zeros(5, dtype="i4")
...     ts["/table"] = t   # CTable stored inline

Replacing an existing object root requires an explicit delete first:

del ts["/table"]
ts["/table"] = new_table
__delitem__(key: str) None[source]

Remove a node, object root, or subtree.

If key is a registered object root, all its physical leaves and the registry entry are removed. If key has children, all descendants are removed recursively. Object internals cannot be deleted directly.

Examples

>>> import dataclasses
>>> @dataclasses.dataclass
... class Row:
...     x: int = 0
>>> t = blosc2.CTable(Row)
>>> _ = t.append(Row(x=1))
>>> with blosc2.TreeStore("store.b2z", mode="w") as ts:
...     ts["/arr"] = blosc2.zeros(3, dtype="i4")
...     ts["/table"] = t
...     del ts["/table"]          # removes all CTable leaves + registry entry
...     print("/table" in ts)
False
__contains__(key: str) bool[source]

Check if a key exists (includes object roots, excludes object internals).

Examples

>>> import dataclasses
>>> @dataclasses.dataclass
... class Row:
...     x: int = 0
>>> t = blosc2.CTable(Row)
>>> _ = t.append(Row(x=7))
>>> with blosc2.TreeStore("store.b2z", mode="w") as ts:
...     ts["/arr"] = blosc2.zeros(2, dtype="i4")
...     ts["/table"] = t
...     print("/table" in ts)       # object root: True
...     print("/table/_meta" in ts) # internal key: False
...     print("/arr" in ts)         # normal leaf: True
True
False
True
__len__() int[source]

Return number of nodes.

__iter__() Iterator[str][source]

Iterate over keys, excluding vlmeta keys.

keys()[source]

Return all keys in the current subtree view.

Object root keys (e.g. CTable) are included as single entries. Object-internal keys are hidden from normal traversal.

Examples

>>> import dataclasses
>>> @dataclasses.dataclass
... class Row:
...     x: int = 0
>>> t = blosc2.CTable(Row)
>>> _ = t.append(Row(x=1))
>>> with blosc2.TreeStore("store.b2z", mode="w") as ts:
...     ts["/arr"] = blosc2.zeros(3, dtype="i4")
...     ts["/group/val"] = blosc2.ones(2, dtype="f4")
...     ts["/table"] = t
...     print(sorted(ts.keys()))
['/arr', '/group', '/group/val', '/table']
values() Iterator[NDArray | C2Array | SChunk | blosc2.ObjectArray | blosc2.BatchArray | blosc2.CTable | TreeStore][source]

Return values in the current subtree view, with object roots collapsed.

items() Iterator[tuple[str, NDArray | C2Array | SChunk | TreeStore]][source]

Return key-value pairs in the current subtree view.

Tree Navigation

get_children(path: str) list[str][source]

Get direct children of a given path.

Parameters:

path (str) – The parent path to get children for.

Returns:

children – List of direct child paths.

Return type:

list[str]

get_descendants(path: str) list[str][source]

Get all descendants of a given path.

Parameters:

path (str) – The parent path to get descendants for.

Returns:

descendants – List of all descendant paths.

Return type:

list[str]

walk(path: str = '/', topdown: bool = True) Iterator[tuple[str, list[str], list[str]]][source]

Walk the tree structure.

Similar to os.walk(), this visits all structural nodes in the hierarchy, yielding information about each level. Returns relative names, not full paths.

Parameters:
  • path (str, optional) – The root path to start walking from. Default is “/”.

  • topdown (bool, optional) – If True (default), traverse top-down (yield parent before children). If False, traverse bottom-up (yield children before parent), mimicking os.walk(topdown=False).

Yields:
  • path (str) – Current path being walked.

  • children (list[str]) – List of child directory names (structural nodes that have descendants). These are just the names, not full paths.

  • nodes (list[str]) – List of leaf node names (nodes that contain data). These are just the names, not full paths.

Examples

>>> for path, children, nodes in tstore.walk("/child0", topdown=True):
...     print(f"Path: {path}, Children: {children}, Nodes: {nodes}")
get_subtree(path: str) TreeStore[source]

Create a subtree view with the specified path as root.

Parameters:

path (str) – The path that will become the root of the subtree view (relative to current subtree, will be normalized to start with ‘/’ if missing).

Returns:

subtree – A new TreeStore instance that presents the subtree as if path were the root.

Return type:

TreeStore

Examples

>>> tstore["/child0/child1/data"] = np.array([1, 2, 3])
>>> tstore["/child0/child1/grandchild"] = np.array([4, 5, 6])
>>> subtree = tstore.get_subtree("/child0/child1")
>>> list(subtree.keys())
['/data', '/grandchild']
>>> subtree["/grandchild"][:]
array([4, 5, 6])

Notes

This is equivalent to tstore[path] when path is a structural path.

Properties

vlmeta

Access variable-length metadata for the TreeStore or current subtree.

Returns a proxy to the vlmeta attribute of an internal SChunk stored at ‘/__vlmeta__’ for the root tree, or ‘<subtree_path>/__vlmeta__’ for subtrees. The SChunk is created on-demand if it doesn’t exist.

Notes

The metadata is stored as vlmeta of an internal SChunk, ensuring robust serialization and persistence. This mirrors SChunk.vlmeta behavior, with additional guarantees: - Bulk get via [:] always returns a dict with string keys and decoded values. - Read-only protection is enforced at the TreeStore level. - Each subtree has its own independent vlmeta storage.

Public Members

close() None[source]

Flush inline object handles then delegate to DictStore.close().

discard() None[source]

Discard without repacking; also discard inline handle storage.

get(key: str, default: Any = None) NDArray | SChunk | ObjectArray | BatchArray | C2Array | Any[source]

Retrieve a node, or default if not found.

get_children(path: str) list[str][source]

Get direct children of a given path.

Parameters:

path (str) – The parent path to get children for.

Returns:

children – List of direct child paths.

Return type:

list[str]

get_descendants(path: str) list[str][source]

Get all descendants of a given path.

Parameters:

path (str) – The parent path to get descendants for.

Returns:

descendants – List of all descendant paths.

Return type:

list[str]

get_subtree(path: str) TreeStore[source]

Create a subtree view with the specified path as root.

Parameters:

path (str) – The path that will become the root of the subtree view (relative to current subtree, will be normalized to start with ‘/’ if missing).

Returns:

subtree – A new TreeStore instance that presents the subtree as if path were the root.

Return type:

TreeStore

Examples

>>> tstore["/child0/child1/data"] = np.array([1, 2, 3])
>>> tstore["/child0/child1/grandchild"] = np.array([4, 5, 6])
>>> subtree = tstore.get_subtree("/child0/child1")
>>> list(subtree.keys())
['/data', '/grandchild']
>>> subtree["/grandchild"][:]
array([4, 5, 6])

Notes

This is equivalent to tstore[path] when path is a structural path.

items() Iterator[tuple[str, NDArray | C2Array | SChunk | TreeStore]][source]

Return key-value pairs in the current subtree view.

keys()[source]

Return all keys in the current subtree view.

Object root keys (e.g. CTable) are included as single entries. Object-internal keys are hidden from normal traversal.

Examples

>>> import dataclasses
>>> @dataclasses.dataclass
... class Row:
...     x: int = 0
>>> t = blosc2.CTable(Row)
>>> _ = t.append(Row(x=1))
>>> with blosc2.TreeStore("store.b2z", mode="w") as ts:
...     ts["/arr"] = blosc2.zeros(3, dtype="i4")
...     ts["/group/val"] = blosc2.ones(2, dtype="f4")
...     ts["/table"] = t
...     print(sorted(ts.keys()))
['/arr', '/group', '/group/val', '/table']
to_b2d(dirname=None, *, overwrite: bool = False) PathLike[Any] | str[source]

Serialize store contents to a directory-backed store.

Parameters:
  • dirname (str, optional) – If provided, use this directory instead of the default directory path. A .b2d suffix is recommended for clarity, but not required.

  • overwrite (bool, optional) – If True, overwrite the existing b2d directory if it exists. Default is False.

Returns:

dirname – The absolute path to the created directory-backed store.

Return type:

str

Examples

Unpack a zip-backed store into a directory-backed store:

with blosc2.DictStore("data.b2z", mode="r") as dstore:
    dstore.to_b2d("data.b2d", overwrite=True)

with blosc2.DictStore("data.b2d", mode="r") as dstore:
    values = dstore["/values"][:]

Copy an existing directory-backed store to another directory. A .b2d suffix is recommended for directory-backed stores:

with blosc2.DictStore("data.b2d", mode="r") as dstore:
    dstore.to_b2d("backup.b2d", overwrite=True)
to_b2z(overwrite=False, filename=None) PathLike[Any] | str[source]

Serialize store contents to a compact .b2z file.

Parameters:
  • overwrite (bool, optional) – If True, overwrite the existing b2z file if it exists. Default is False.

  • filename (str, optional) – If provided, use this filename instead of the default b2z file path. Keyword use is recommended for clarity.

Returns:

filename – The absolute path to the created b2z file.

Return type:

str

Examples

Pack a directory-backed store into a zip store. A .b2d suffix is recommended for directory-backed stores, but not required:

with blosc2.DictStore("data.b2d", mode="w") as dstore:
    dstore["/values"] = np.arange(10)

with blosc2.DictStore("data.b2d", mode="r") as dstore:
    dstore.to_b2z(filename="data.b2z", overwrite=True)

filename can also be passed positionally:

with blosc2.DictStore("data.b2d", mode="r") as dstore:
    dstore.to_b2z("copy.b2z", overwrite=True)
values() Iterator[NDArray | C2Array | SChunk | blosc2.ObjectArray | blosc2.BatchArray | blosc2.CTable | TreeStore][source]

Return values in the current subtree view, with object roots collapsed.

walk(path: str = '/', topdown: bool = True) Iterator[tuple[str, list[str], list[str]]][source]

Walk the tree structure.

Similar to os.walk(), this visits all structural nodes in the hierarchy, yielding information about each level. Returns relative names, not full paths.

Parameters:
  • path (str, optional) – The root path to start walking from. Default is “/”.

  • topdown (bool, optional) – If True (default), traverse top-down (yield parent before children). If False, traverse bottom-up (yield children before parent), mimicking os.walk(topdown=False).

Yields:
  • path (str) – Current path being walked.

  • children (list[str]) – List of child directory names (structural nodes that have descendants). These are just the names, not full paths.

  • nodes (list[str]) – List of leaf node names (nodes that contain data). These are just the names, not full paths.

Examples

>>> for path, children, nodes in tstore.walk("/child0", topdown=True):
...     print(f"Path: {path}, Children: {children}, Nodes: {nodes}")
property estore: EmbedStore

Access the underlying EmbedStore.

property vlmeta: MutableMapping

Access variable-length metadata for the TreeStore or current subtree.

Returns a proxy to the vlmeta attribute of an internal SChunk stored at ‘/__vlmeta__’ for the root tree, or ‘<subtree_path>/__vlmeta__’ for subtrees. The SChunk is created on-demand if it doesn’t exist.

Notes

The metadata is stored as vlmeta of an internal SChunk, ensuring robust serialization and persistence. This mirrors SChunk.vlmeta behavior, with additional guarantees: - Bulk get via [:] always returns a dict with string keys and decoded values. - Read-only protection is enforced at the TreeStore level. - Each subtree has its own independent vlmeta storage.

Notes

  • Keys must start with /. The root is /. Empty path segments (//) are not allowed.

  • Leaf nodes hold the actual data (NumPy arrays, NDArray, C2Array). Structural nodes exist implicitly to organize leaves and are not directly assigned any data.

  • For storage/embedding thresholds and external arrays behavior, see also DictStore which TreeStore extends.