TreeStore¶
A hierarchical, tree‑like container to organize compressed arrays with Blosc2.
Overview¶
TreeStore builds on top of DictStore by enforcing a strict hierarchical key
structure and by providing helpers to navigate the hierarchy. Keys are POSIX‑like
paths that must start with a leading slash (e.g. "/child0/child/leaf"
). Data is
stored only at leaf nodes; intermediate path segments are considered structural
nodes and are created implicitly as you assign arrays to leaves.
Like DictStore, TreeStore supports two on‑disk representations:
.b2d
: a directory layout (B2DIR) where external arrays are regular.b2nd
files and a small embedded store (embed.b2e
) holds small/in‑memory arrays..b2z
: a single zip file (B2ZIP) that mirrors the above directory structure. You can create it directly or convert from a.b2d
layout.
Small arrays (below a size threshold) and in‑memory objects go to the embedded
store, while larger arrays or explicitly external arrays are stored as separate
.b2nd
files. You can traverse your dataset hierarchically with walk()
, query
children/descendants, or focus on a subtree view with get_subtree()
.
Quick example¶
import numpy as np
import blosc2
# Create a hierarchical store backed by a zip file
with blosc2.TreeStore("my_tree.b2z", mode="w") as tstore:
# Data is stored at leaves; structural nodes are created implicitly
tstore["/child0/leaf1"] = np.array([1, 2, 3])
tstore["/child0/child1/leaf2"] = np.array([4, 5, 6])
tstore["/child0/child2"] = np.array([7, 8, 9])
# Inspect hierarchy
for path, children, nodes in tstore.walk("/child0"):
print(path, sorted(children), sorted(nodes))
# Work with a subtree view rooted at /child0
subtree = tstore.get_subtree("/child0")
print(sorted(subtree.keys())) # ['/child1/leaf2', '/child2', '/leaf1']
print(subtree["/child1/leaf2"][:]) # [4 5 6]
- class blosc2.TreeStore(*args, _from_parent_store=None, **kwargs)[source]¶
A hierarchical tree-based storage container for compressed data using Blosc2.
Extends DictStore with strict hierarchical key validation and tree traversal capabilities. Keys must follow a hierarchical structure using ‘/’ as separator and always start with ‘/’.
- Parameters:
localpath¶ (str) – Local path for the directory (.b2d) or file (.b2z); other extensions are not supported. If a directory is specified, it will be treated as a Blosc2 directory format (B2DIR). If a file is specified, it will be treated as a Blosc2 zip format (B2ZIP).
mode¶ (str, optional) – File mode (‘r’, ‘w’, ‘a’). Default is ‘a’.
tmpdir¶ (str or None, optional) – Temporary directory to use when working with .b2z files. If None, a system temporary directory will be managed. Default is None.
cparams¶ (dict or None, optional) – Compression parameters for the internal embed store. If None, the default Blosc2 parameters are used.
dparams¶ (dict or None, optional) – Decompression parameters for the internal embed store. If None, the default Blosc2 parameters are used.
storage¶ (blosc2.Storage or None, optional) – Storage properties for the internal embed store. If None, the default Blosc2 storage properties are used.
threshold¶ (int, optional) – Threshold for the array size (bytes) to be kept in the embed store. If the compressed array size is below this threshold, it will be stored in the embed store instead of as a separate file. If None, in-memory arrays are stored in the embed store and on-disk arrays are stored as separate files. C2Array objects will always be stored in the embed store, regardless of their size.
Examples
>>> tstore = TreeStore(localpath="my_tstore.b2z", mode="w") >>> # Create a hierarchy. Data is stored in leaf nodes. >>> # Structural nodes like /child0 and /child0/child1 are created automatically. >>> tstore["/child0/leaf1"] = np.array([1, 2, 3]) >>> tstore["/child0/child1/leaf2"] = np.array([4, 5, 6]) >>> tstore["/child0/child2"] = np.array([7, 8, 9]) >>> >>> # Walk the tree structure >>> for path, children, nodes in tstore.walk("/child0"): ... print(f"Path: {path}, Children: {sorted(children)}, Nodes: {sorted(nodes)}") Path: /child0, Children: ['/child0/child1'], Nodes: ['/child0/child2', '/child0/leaf1'] Path: /child0/child1, Children: [], Nodes: ['/child0/child1/leaf2'] >>> >>> # Get a subtree view >>> subtree = tstore.get_subtree("/child0") >>> sorted(list(subtree.keys())) ['/child1/leaf2', '/child2', '/leaf1']
Notes
The TreeStore is still experimental and subject to change. Please report any issues you may find.
- Attributes:
Methods
close
()Persist changes and cleanup.
get
(key[, default])Retrieve a node, or default if not found.
get_children
(path)Get direct children of a given path.
get_descendants
(path)Get all descendants of a given path.
get_subtree
(path)Create a subtree view with the specified path as root.
items
()Return key-value pairs in the current subtree view.
keys
()Return all keys in the current subtree view.
to_b2z
([overwrite, filename])Serialize zip store contents to the b2z file.
values
()Iterate over all values.
walk
([path, topdown])Walk the tree structure.
- Special Methods:
__init__
(*args[, _from_parent_store])Initialize TreeStore with subtree support.
__getitem__
(key)Retrieve a node or subtree view.
__setitem__
(key, value)Add a node with hierarchical key validation.
__delitem__
(key)Remove a node or subtree.
__contains__
(key)Check if a key exists.
__len__
()Return number of nodes.
__iter__
()Iterate over keys.
Constructors¶
Dictionary Interface¶
- __getitem__(key: str) NDArray | C2Array | SChunk | TreeStore [source]¶
Retrieve a node or subtree view.
If the key points to a subtree (intermediate path with children), returns a TreeStore view of that subtree. If the key points to a final node (leaf), returns the stored array or schunk.
- Parameters:
key¶ (str) – Hierarchical node key.
- Returns:
out – The stored array/chunk if key is a leaf node, or a TreeStore subtree view if key is an intermediate path with children.
- Return type:
blosc2.NDArray or blosc2.C2Array or blosc2.SChunk or TreeStore
- Raises:
KeyError – If key is not found.
ValueError – If key doesn’t follow hierarchical structure rules.
- __setitem__(key: str, value: ndarray | NDArray | C2Array | SChunk) None [source]¶
Add a node with hierarchical key validation.
- Parameters:
key¶ (str) – Hierarchical node key (must start with ‘/’ and use ‘/’ as separator).
value¶ (np.ndarray or blosc2.NDArray or blosc2.C2Array or blosc2.SChunk) – to store.
- Raises:
ValueError – If key doesn’t follow hierarchical structure rules, if trying to assign to a structural path that already has children, or if trying to add a child to a path that already contains data.
- __delitem__(key: str) None [source]¶
Remove a node or subtree.
If the key points to a subtree (intermediate path with children), removes all nodes in that subtree recursively. If the key points to a final node (leaf), removes only that node.
- Parameters:
key¶ (str) – Hierarchical node key.
- Raises:
KeyError – If key is not found.
ValueError – If key doesn’t follow hierarchical structure rules.
Properties¶
- vlmeta¶
Access variable-length metadata for the TreeStore.
Returns a proxy to the vlmeta attribute of an internal SChunk stored at ‘/__vlmeta__’. The SChunk is created on-demand if it doesn’t exist.
Notes
The metadata is stored as vlmeta of an internal SChunk, ensuring robust serialization and persistence. This mirrors SChunk.vlmeta behavior, with additional guarantees: - Bulk get via [:] always returns a dict with string keys and decoded values. - Read-only protection is enforced at the TreeStore level.
Public Members¶
- get(key: str, default: Any = None) NDArray | SChunk | C2Array | Any [source]¶
Retrieve a node, or default if not found.
- get_children(path: str) list[str] [source]¶
Get direct children of a given path.
- Parameters:
path¶ (str) – The parent path to get children for.
- Returns:
children – List of direct child paths.
- Return type:
list[str]
- get_descendants(path: str) list[str] [source]¶
Get all descendants of a given path.
- Parameters:
path¶ (str) – The parent path to get descendants for.
- Returns:
descendants – List of all descendant paths.
- Return type:
list[str]
- get_subtree(path: str) TreeStore [source]¶
Create a subtree view with the specified path as root.
- Parameters:
path¶ (str) – The path that will become the root of the subtree view (relative to current subtree).
- Returns:
subtree – A new TreeStore instance that presents the subtree as if path were the root.
- Return type:
Examples
>>> tstore["/child0/child1/data"] = np.array([1, 2, 3]) >>> tstore["/child0/child1/grandchild"] = np.array([4, 5, 6]) >>> subtree = tstore.get_subtree("/child0/child1") >>> list(subtree.keys()) ['/data', '/grandchild'] >>> subtree["/grandchild"][:] array([4, 5, 6])
Notes
This is equivalent to tstore[path] when path is a structural path.
- items() Iterator[tuple[str, NDArray | C2Array | SChunk | TreeStore]] [source]¶
Return key-value pairs in the current subtree view.
- to_b2z(overwrite=False, filename=None) PathLike[Any] | str [source]¶
Serialize zip store contents to the b2z file.
- walk(path: str = '/', topdown: bool = True) Iterator[tuple[str, list[str], list[str]]] [source]¶
Walk the tree structure.
Similar to os.walk(), this visits all structural nodes in the hierarchy, yielding information about each level. Returns relative names, not full paths.
- Parameters:
- Yields:
path (str) – Current path being walked.
children (list[str]) – List of child directory names (structural nodes that have descendants). These are just the names, not full paths.
nodes (list[str]) – List of leaf node names (nodes that contain data). These are just the names, not full paths.
Examples
>>> for path, children, nodes in tstore.walk("/child0", topdown=True): ... print(f"Path: {path}, Children: {children}, Nodes: {nodes}")
- property estore: EmbedStore¶
Access the underlying EmbedStore.
- property vlmeta: MutableMapping | None¶
Access variable-length metadata for the TreeStore.
Returns a proxy to the vlmeta attribute of an internal SChunk stored at ‘/__vlmeta__’. The SChunk is created on-demand if it doesn’t exist.
Notes
The metadata is stored as vlmeta of an internal SChunk, ensuring robust serialization and persistence. This mirrors SChunk.vlmeta behavior, with additional guarantees: - Bulk get via [:] always returns a dict with string keys and decoded values. - Read-only protection is enforced at the TreeStore level.
Notes¶
Keys must start with
/
. The root is/
. Empty path segments (//
) are not allowed.Leaf nodes hold the actual data (NumPy arrays, NDArray, C2Array). Structural nodes exist implicitly to organize leaves and are not directly assigned any data.
For storage/embedding thresholds and external arrays behavior, see also
DictStore
which TreeStore extends.