TreeStore¶
A hierarchical, tree‑like container to organize compressed arrays with Blosc2.
Overview¶
TreeStore builds on top of DictStore by enforcing a strict hierarchical key
structure and by providing helpers to navigate the hierarchy. Keys are POSIX‑like
paths that must start with a leading slash (e.g. "/child0/child/leaf"). Data is
stored only at leaf nodes; intermediate path segments are considered structural
nodes and are created implicitly as you assign arrays to leaves.
Like DictStore, TreeStore supports two on‑disk representations:
.b2d: a directory layout (B2DIR) where external arrays are regular.b2ndfiles and a small embedded store (embed.b2e) holds small/in‑memory arrays..b2z: a single zip file (B2ZIP) that mirrors the above directory structure. You can create it directly or convert from a.b2dlayout.
Small arrays (below a size threshold) and in‑memory objects go to the embedded
store, while larger arrays or explicitly external arrays are stored as separate
.b2nd files. You can traverse your dataset hierarchically with walk(), query
children/descendants, or focus on a subtree view with get_subtree().
TreeStore also supports read-only memory-mapped opens via mmap_mode="r"
(constructor or blosc2.open()) for both .b2d and .b2z formats.
Quick example¶
import numpy as np
import blosc2
# Create a hierarchical store backed by a zip file
with blosc2.TreeStore("my_tree.b2z", mode="w") as tstore:
# Data is stored at leaves; structural nodes are created implicitly
tstore["/child0/leaf1"] = np.array([1, 2, 3])
tstore["/child0/child1/leaf2"] = np.array([4, 5, 6])
tstore["/child0/child2"] = np.array([7, 8, 9])
# Inspect hierarchy
for path, children, nodes in tstore.walk("/child0"):
print(path, sorted(children), sorted(nodes))
# Work with a subtree view rooted at /child0
subtree = tstore.get_subtree("/child0")
print(sorted(subtree.keys())) # ['/child1/leaf2', '/child2', '/leaf1']
print(subtree["/child1/leaf2"][:]) # [4 5 6]
# Reopen using blosc2.open
with blosc2.open("my_tree.b2z", mode="r") as tstore:
print(sorted(tstore.keys()))
# Reopen in read-only mmap mode
with blosc2.open("my_tree.b2z", mode="r", mmap_mode="r") as tstore_mmap:
print(tstore_mmap["/child0/leaf1"][0:2])
Note
For store containers, only mmap_mode="r" is currently supported, and it
requires mode="r".
- class blosc2.TreeStore(*args, _from_parent_store=None, **kwargs)[source]¶
A hierarchical tree-based storage container for Blosc2 data.
Extends
blosc2.DictStorewith strict hierarchical key validation and tree traversal capabilities. Keys must follow a hierarchical structure using ‘/’ as separator and always start with ‘/’. If user passes a key that doesn’t start with ‘/’, it will be automatically added.It supports the same arguments as
blosc2.DictStore.- Parameters:
localpath¶ (str) – Local path for the directory-backed store or compact zip-backed file. A
.b2zsuffix selects the zip-backed format. Existing directories, and new paths not ending in.b2z, use Blosc2 directory format (B2DIR); a.b2dsuffix is recommended for these directory-backed stores. Existing files are treated as Blosc2 zip format (B2ZIP).mode¶ (str, optional) – File mode (‘r’, ‘w’, ‘a’). Default is ‘a’.
tmpdir¶ (str or None, optional) – Temporary directory to use when working with .b2z files. If None, a temporary directory is created in the same directory as the .b2z file, so that unpacked data stays on the same filesystem. Default is None.
cparams¶ (dict or None, optional) – Compression parameters for the internal embed store. If None, the default Blosc2 parameters are used.
dparams¶ (dict or None, optional) – Decompression parameters for the internal embed store. If None, the default Blosc2 parameters are used.
storage¶ (blosc2.Storage or None, optional) – Storage properties for the internal embed store. If None, the default Blosc2 storage properties are used.
threshold¶ (int, optional) – Threshold for the array size (bytes) to be kept in the embed store. If the compressed array size is below this threshold, it will be stored in the embed store instead of as a separate file. If None, in-memory arrays are stored in the embed store and on-disk arrays are stored as separate files. C2Array objects will always be stored in the embed store, regardless of their size.
Examples
Store plain arrays in a hierarchy:
>>> tstore = TreeStore(localpath="my_tstore.b2z", mode="w") >>> # Data lives in leaf nodes; structural nodes are created automatically. >>> tstore["/child0/leaf1"] = np.array([1, 2, 3]) >>> tstore["/child0/child1/leaf2"] = np.array([4, 5, 6]) >>> tstore["/child0/child2"] = np.array([7, 8, 9]) >>> >>> # Walk the tree structure >>> for path, children, nodes in tstore.walk("/child0"): ... print(f"Path: {path}, Children: {sorted(children)}, Nodes: {sorted(nodes)}") Path: /child0, Children: ['/child0/child1'], Nodes: ['/child0/child2', '/child0/leaf1'] Path: /child0/child1, Children: [], Nodes: ['/child0/child1/leaf2'] >>> >>> # Get a subtree view >>> subtree = tstore.get_subtree("/child0") >>> sorted(list(subtree.keys())) ['/child1/leaf2', '/child2', '/leaf1']
Mix NDArrays and CTables in the same bundle:
>>> import dataclasses >>> @dataclasses.dataclass ... class Row: ... x: int = 0 ... y: float = 0.0 >>> table = blosc2.CTable(Row) >>> _ = table.append(Row(x=1, y=1.5)) >>> _ = table.append(Row(x=2, y=3.0)) >>> with blosc2.TreeStore("bundle.b2z", mode="w") as ts: ... ts["/data/array"] = blosc2.arange(5) ... ts["/data/table"] = table >>> with blosc2.open("bundle.b2z", mode="r") as ts: ... print(sorted(ts.keys())) ... arr = ts["/data/array"] ... tbl = ts["/data/table"] ... print(type(tbl).__name__, len(tbl)) ['/data', '/data/array', '/data/table'] CTable 2
- Attributes:
Methods
close()Flush inline object handles then delegate to DictStore.close().
discard()Discard without repacking; also discard inline handle storage.
get(key[, default])Retrieve a node, or default if not found.
get_children(path)Get direct children of a given path.
get_descendants(path)Get all descendants of a given path.
get_subtree(path)Create a subtree view with the specified path as root.
items()Return key-value pairs in the current subtree view.
keys()Return all keys in the current subtree view.
to_b2d([dirname, overwrite])Serialize store contents to a directory-backed store.
to_b2z([overwrite, filename])Serialize store contents to a compact
.b2zfile.values()Return values in the current subtree view, with object roots collapsed.
walk([path, topdown])Walk the tree structure.
- Special Methods:
__init__(*args[, _from_parent_store])Initialize TreeStore with subtree support.
__getitem__(key)Retrieve a node, object, or subtree view.
__setitem__(key, value)Add a node with hierarchical key validation.
__delitem__(key)Remove a node, object root, or subtree.
__contains__(key)Check if a key exists (includes object roots, excludes object internals).
__len__()Return number of nodes.
__iter__()Iterate over keys, excluding vlmeta keys.
Constructors¶
- __init__(*args, _from_parent_store=None, **kwargs)[source]¶
Initialize TreeStore with subtree support.
It supports the same arguments as
blosc2.DictStore.
Dictionary Interface¶
- __getitem__(key: str) NDArray | C2Array | SChunk | blosc2.ObjectArray | blosc2.BatchArray | blosc2.CTable | TreeStore[source]¶
Retrieve a node, object, or subtree view.
If the key is a registered object root (e.g. CTable) returns that object. If the key is a structural intermediate path returns a subtree view. If the key is a leaf returns the stored array/schunk.
Examples
>>> import dataclasses >>> @dataclasses.dataclass ... class Row: ... x: int = 0 >>> t = blosc2.CTable(Row) >>> _ = t.append(Row(x=42)) >>> with blosc2.TreeStore("store.b2z", mode="w") as ts: ... ts["/arr"] = blosc2.zeros(3, dtype="i4") ... ts["/group/val"] = blosc2.ones(2, dtype="f4") ... ts["/table"] = t >>> with blosc2.open("store.b2z", mode="r") as ts: ... arr = ts["/arr"] # NDArray leaf ... sub = ts["/group"] # TreeStore subtree view ... tbl = ts["/table"] # CTable object ... print(type(arr).__name__, type(sub).__name__, type(tbl).__name__) NDArray TreeStore CTable
- __setitem__(key: str, value: blosc2.Array | SChunk | blosc2.ObjectArray | blosc2.BatchArray | blosc2.CTable) None[source]¶
Add a node with hierarchical key validation.
- Parameters:
key¶ (str) – Hierarchical node key.
value¶ (np.ndarray or blosc2.NDArray or blosc2.C2Array or blosc2.SChunk or blosc2.CTable) – to store.
- Raises:
ValueError – If key doesn’t follow hierarchical structure rules, if trying to assign to a structural path that already has children, or if trying to add a child to a path that already contains data.
Examples
Store an NDArray and a CTable together:
>>> import dataclasses >>> @dataclasses.dataclass ... class Row: ... x: int = 0 >>> t = blosc2.CTable(Row) >>> _ = t.append(Row(x=10)) >>> with blosc2.TreeStore("store.b2z", mode="w") as ts: ... ts["/arr"] = blosc2.zeros(5, dtype="i4") ... ts["/table"] = t # CTable stored inline
Replacing an existing object root requires an explicit delete first:
del ts["/table"] ts["/table"] = new_table
- __delitem__(key: str) None[source]¶
Remove a node, object root, or subtree.
If key is a registered object root, all its physical leaves and the registry entry are removed. If key has children, all descendants are removed recursively. Object internals cannot be deleted directly.
Examples
>>> import dataclasses >>> @dataclasses.dataclass ... class Row: ... x: int = 0 >>> t = blosc2.CTable(Row) >>> _ = t.append(Row(x=1)) >>> with blosc2.TreeStore("store.b2z", mode="w") as ts: ... ts["/arr"] = blosc2.zeros(3, dtype="i4") ... ts["/table"] = t ... del ts["/table"] # removes all CTable leaves + registry entry ... print("/table" in ts) False
- __contains__(key: str) bool[source]¶
Check if a key exists (includes object roots, excludes object internals).
Examples
>>> import dataclasses >>> @dataclasses.dataclass ... class Row: ... x: int = 0 >>> t = blosc2.CTable(Row) >>> _ = t.append(Row(x=7)) >>> with blosc2.TreeStore("store.b2z", mode="w") as ts: ... ts["/arr"] = blosc2.zeros(2, dtype="i4") ... ts["/table"] = t ... print("/table" in ts) # object root: True ... print("/table/_meta" in ts) # internal key: False ... print("/arr" in ts) # normal leaf: True True False True
- keys()[source]¶
Return all keys in the current subtree view.
Object root keys (e.g. CTable) are included as single entries. Object-internal keys are hidden from normal traversal.
Examples
>>> import dataclasses >>> @dataclasses.dataclass ... class Row: ... x: int = 0 >>> t = blosc2.CTable(Row) >>> _ = t.append(Row(x=1)) >>> with blosc2.TreeStore("store.b2z", mode="w") as ts: ... ts["/arr"] = blosc2.zeros(3, dtype="i4") ... ts["/group/val"] = blosc2.ones(2, dtype="f4") ... ts["/table"] = t ... print(sorted(ts.keys())) ['/arr', '/group', '/group/val', '/table']
- values() Iterator[NDArray | C2Array | SChunk | blosc2.ObjectArray | blosc2.BatchArray | blosc2.CTable | TreeStore][source]¶
Return values in the current subtree view, with object roots collapsed.
Properties¶
- vlmeta¶
Access variable-length metadata for the TreeStore or current subtree.
Returns a proxy to the vlmeta attribute of an internal SChunk stored at ‘/__vlmeta__’ for the root tree, or ‘<subtree_path>/__vlmeta__’ for subtrees. The SChunk is created on-demand if it doesn’t exist.
Notes
The metadata is stored as vlmeta of an internal SChunk, ensuring robust serialization and persistence. This mirrors SChunk.vlmeta behavior, with additional guarantees: - Bulk get via [:] always returns a dict with string keys and decoded values. - Read-only protection is enforced at the TreeStore level. - Each subtree has its own independent vlmeta storage.
Public Members¶
- get(key: str, default: Any = None) NDArray | SChunk | ObjectArray | BatchArray | C2Array | Any[source]¶
Retrieve a node, or default if not found.
- get_children(path: str) list[str][source]¶
Get direct children of a given path.
- Parameters:
path¶ (str) – The parent path to get children for.
- Returns:
children – List of direct child paths.
- Return type:
list[str]
- get_descendants(path: str) list[str][source]¶
Get all descendants of a given path.
- Parameters:
path¶ (str) – The parent path to get descendants for.
- Returns:
descendants – List of all descendant paths.
- Return type:
list[str]
- get_subtree(path: str) TreeStore[source]¶
Create a subtree view with the specified path as root.
- Parameters:
path¶ (str) – The path that will become the root of the subtree view (relative to current subtree, will be normalized to start with ‘/’ if missing).
- Returns:
subtree – A new TreeStore instance that presents the subtree as if path were the root.
- Return type:
Examples
>>> tstore["/child0/child1/data"] = np.array([1, 2, 3]) >>> tstore["/child0/child1/grandchild"] = np.array([4, 5, 6]) >>> subtree = tstore.get_subtree("/child0/child1") >>> list(subtree.keys()) ['/data', '/grandchild'] >>> subtree["/grandchild"][:] array([4, 5, 6])
Notes
This is equivalent to tstore[path] when path is a structural path.
- items() Iterator[tuple[str, NDArray | C2Array | SChunk | TreeStore]][source]¶
Return key-value pairs in the current subtree view.
- keys()[source]¶
Return all keys in the current subtree view.
Object root keys (e.g. CTable) are included as single entries. Object-internal keys are hidden from normal traversal.
Examples
>>> import dataclasses >>> @dataclasses.dataclass ... class Row: ... x: int = 0 >>> t = blosc2.CTable(Row) >>> _ = t.append(Row(x=1)) >>> with blosc2.TreeStore("store.b2z", mode="w") as ts: ... ts["/arr"] = blosc2.zeros(3, dtype="i4") ... ts["/group/val"] = blosc2.ones(2, dtype="f4") ... ts["/table"] = t ... print(sorted(ts.keys())) ['/arr', '/group', '/group/val', '/table']
- to_b2d(dirname=None, *, overwrite: bool = False) PathLike[Any] | str[source]¶
Serialize store contents to a directory-backed store.
- Parameters:
- Returns:
dirname – The absolute path to the created directory-backed store.
- Return type:
str
Examples
Unpack a zip-backed store into a directory-backed store:
with blosc2.DictStore("data.b2z", mode="r") as dstore: dstore.to_b2d("data.b2d", overwrite=True) with blosc2.DictStore("data.b2d", mode="r") as dstore: values = dstore["/values"][:]
Copy an existing directory-backed store to another directory. A
.b2dsuffix is recommended for directory-backed stores:with blosc2.DictStore("data.b2d", mode="r") as dstore: dstore.to_b2d("backup.b2d", overwrite=True)
- to_b2z(overwrite=False, filename=None) PathLike[Any] | str[source]¶
Serialize store contents to a compact
.b2zfile.- Parameters:
- Returns:
filename – The absolute path to the created b2z file.
- Return type:
str
Examples
Pack a directory-backed store into a zip store. A
.b2dsuffix is recommended for directory-backed stores, but not required:with blosc2.DictStore("data.b2d", mode="w") as dstore: dstore["/values"] = np.arange(10) with blosc2.DictStore("data.b2d", mode="r") as dstore: dstore.to_b2z(filename="data.b2z", overwrite=True)
filenamecan also be passed positionally:with blosc2.DictStore("data.b2d", mode="r") as dstore: dstore.to_b2z("copy.b2z", overwrite=True)
- values() Iterator[NDArray | C2Array | SChunk | blosc2.ObjectArray | blosc2.BatchArray | blosc2.CTable | TreeStore][source]¶
Return values in the current subtree view, with object roots collapsed.
- walk(path: str = '/', topdown: bool = True) Iterator[tuple[str, list[str], list[str]]][source]¶
Walk the tree structure.
Similar to os.walk(), this visits all structural nodes in the hierarchy, yielding information about each level. Returns relative names, not full paths.
- Parameters:
- Yields:
path (str) – Current path being walked.
children (list[str]) – List of child directory names (structural nodes that have descendants). These are just the names, not full paths.
nodes (list[str]) – List of leaf node names (nodes that contain data). These are just the names, not full paths.
Examples
>>> for path, children, nodes in tstore.walk("/child0", topdown=True): ... print(f"Path: {path}, Children: {children}, Nodes: {nodes}")
- property estore: EmbedStore¶
Access the underlying EmbedStore.
- property vlmeta: MutableMapping¶
Access variable-length metadata for the TreeStore or current subtree.
Returns a proxy to the vlmeta attribute of an internal SChunk stored at ‘/__vlmeta__’ for the root tree, or ‘<subtree_path>/__vlmeta__’ for subtrees. The SChunk is created on-demand if it doesn’t exist.
Notes
The metadata is stored as vlmeta of an internal SChunk, ensuring robust serialization and persistence. This mirrors SChunk.vlmeta behavior, with additional guarantees: - Bulk get via [:] always returns a dict with string keys and decoded values. - Read-only protection is enforced at the TreeStore level. - Each subtree has its own independent vlmeta storage.
Notes¶
Keys must start with
/. The root is/. Empty path segments (//) are not allowed.Leaf nodes hold the actual data (NumPy arrays, NDArray, C2Array). Structural nodes exist implicitly to organize leaves and are not directly assigned any data.
For storage/embedding thresholds and external arrays behavior, see also
DictStorewhich TreeStore extends.