DictStore¶
A high‑level, dictionary‑like container to organize compressed arrays with Blosc2.
Overview¶
DictStore lets you store and retrieve arrays by string keys (paths like "/dir/node"
), similar to a Python dict, while transparently handling efficient Blosc2 compression and persistence. It supports two on‑disk representations:
.b2d
: a directory layout (B2DIR) where each external array is a separate file:.b2nd
for NDArray and.b2f
for SChunk; an embedded store file (embed.b2e
) keeps small/in‑memory arrays..b2z
: a single zip file (B2ZIP) that mirrors the directory structure above. You can zip up a.b2d
layout or write directly and later reopen it for reading.
Supported values include blosc2.NDArray
, blosc2.SChunk
and blosc2.C2Array
(as well as numpy.ndarray
, which is converted to NDArray). Small arrays (below a configurable compression‑size threshold) and in‑memory objects are kept inside the embedded store; larger or explicitly external arrays live as regular .b2nd
(NDArray) or .b2f
(SChunk) files. C2Array
objects are always stored in the embedded store. You can mix all types seamlessly and use the usual mapping methods (__getitem__
, __setitem__
, keys()
, items()
…).
Quick example¶
import numpy as np
import blosc2
# Create a store backed by a zip file
with blosc2.DictStore("my_dstore.b2z", mode="w") as dstore:
dstore["/node1"] = np.array([1, 2, 3]) # small -> embedded store
dstore["/node2"] = blosc2.ones(2) # small -> embedded store
arr_ext = blosc2.arange(3, urlpath="n3.b2nd", mode="w")
dstore["/dir1/node3"] = arr_ext # external file referenced
# Reopen and read
with blosc2.DictStore("my_dstore.b2z", mode="r") as dstore:
print(sorted(dstore.keys())) # ['/dir1/node3', '/node1', '/node2']
print(dstore["/node1"][:]) # [1 2 3]
- class blosc2.DictStore(localpath: PathLike[Any] | str | bytes, mode: str = 'a', tmpdir: str | None = None, cparams: CParams | None = None, dparams: CParams | None = None, storage: Storage | None = None, threshold: int | None = 65536)[source]¶
Directory-based storage for compressed data using Blosc2. Manages arrays in a directory (.b2d) or zip (.b2z) format.
Supports the following types:
blosc2.NDArray: n-dimensional arrays. When persisted externally they are stored as .b2nd files.
blosc2.SChunk: super-chunks. When persisted externally they are stored as .b2f files.
blosc2.C2Array: columnar containers. These are always kept inside the embedded store (never externalized).
numpy.ndarray: converted to blosc2.NDArray on assignment.
- Parameters:
localpath¶ (str) – Local path for the directory (“.b2d”) or file (“.b2z”); other extensions are not supported. If a directory is specified, it will be treated as a Blosc2 directory format (B2DIR). If a file is specified, it will be treated as a Blosc2 zip format (B2ZIP).
mode¶ (str, optional) – File mode (‘r’, ‘w’, ‘a’). Default is ‘a’.
tmpdir¶ (str or None, optional) – Temporary directory to use when working with “.b2z” files. If None, a system temporary directory will be managed. Default is None.
cparams¶ (dict or None, optional) – Compression parameters for the internal embed store. If None, the default Blosc2 parameters are used.
dparams¶ (dict or None, optional) – Decompression parameters for the internal embed store. If None, the default Blosc2 parameters are used.
storage¶ (blosc2.Storage or None, optional) – Storage properties for the internal embed store. If None, the default Blosc2 storage properties are used.
threshold¶ (int or None, optional) – Threshold (in bytes of uncompressed data) under which values are kept in the embedded store. If None, in-memory arrays are stored in the embedded store and on-disk arrays are stored as separate files. C2Array objects will always be stored in the embedded store, regardless of their size.
Examples
>>> dstore = DictStore(localpath="my_dstore.b2z", mode="w") >>> dstore["/node1"] = np.array([1, 2, 3]) # goes to embed store >>> dstore["/node2"] = blosc2.ones(2) # goes to embed store >>> arr_external = blosc2.arange(3, urlpath="ext_node3.b2nd", mode="w") >>> dstore["/dir1/node3"] = arr_external # external file in dir1 (.b2nd) >>> schunk = blosc2.SChunk(chunksize=32) >>> schunk.append_data(b"abcd") 4 >>> dstore["/dir1/schunk1"] = schunk # externalized as .b2f if above threshold >>> dstore.to_b2z() # persist to the zip file; external files are copied in >>> print(sorted(dstore.keys())) ['/dir1/node3', '/dir1/schunk1', '/node1', '/node2'] >>> print(dstore["/node1"][:])) array([1, 2, 3])
Notes
The DictStore is still experimental and subject to change. Please report any issues you may find.
External persistence uses the following file extensions: .b2nd for NDArray and .b2f for SChunk.
- Attributes:
estore
Access the underlying EmbedStore.
Methods
close
()Persist changes and cleanup.
get
(key[, default])Retrieve a node, or default if not found.
items
()Iterate over (key, value) pairs.
keys
()Return all keys.
to_b2z
([overwrite, filename])Serialize zip store contents to the b2z file.
values
()Iterate over all values.
- Special Methods:
__init__
(localpath[, mode, tmpdir, cparams, ...])See
DictStore
for full documentation of parameters.__getitem__
(key)Retrieve a node from the DictStore.
__setitem__
(key, value)Add a node to the DictStore.
__delitem__
(key)Remove a node from the DictStore.
__contains__
(key)Check if a key exists.
__len__
()Return number of nodes.
__iter__
()Iterate over keys.
Context manager enter.
__exit__
(exc_type, exc_val, exc_tb)Context manager exit.
Constructors¶
Dictionary Interface¶
Context Manager¶
Public Members¶
- get(key: str, default: Any = None) NDArray | SChunk | C2Array | Any [source]¶
Retrieve a node, or default if not found.
- to_b2z(overwrite=False, filename=None) PathLike[Any] | str [source]¶
Serialize zip store contents to the b2z file.
- property estore: EmbedStore¶
Access the underlying EmbedStore.