EmbedStore¶
Overview¶
EmbedStore is a dictionary-like container that lets you pack many arrays into a single, compressed Blosc2 container file (recommended extension: .b2e
).
It can hold:
- NumPy arrays (their data is embedded as compressed bytes),
- Blosc2 NDArrays (either in-memory or persisted in their own .b2nd
files; when added to the store, their data is embedded),
- Blosc2 SChunk objects (their frames are embedded), and
- remote Blosc2 arrays (C2Array
) addressed via URLs.
Important: Only remote C2Array
objects are stored as lightweight references (URL base and path). NumPy arrays and NDArrays are always embedded into the .b2e
container, even if the NDArray originates from an external .b2nd
file.
Typical use cases include bundling several small/medium arrays together, shipping datasets as one file, or creating a simple keyed store for heterogeneous array sources.
Quickstart¶
import numpy as np
import blosc2
estore = blosc2.EmbedStore(urlpath="example_estore.b2e", mode="w")
estore["/node1"] = np.array([1, 2, 3]) # embedded NumPy array
estore["/node2"] = blosc2.ones(2) # embedded NDArray
estore["/node3"] = blosc2.arange(
3,
dtype="i4", # NDArray (embedded, even if it has its own .b2nd)
urlpath="external_node3.b2nd",
mode="w",
)
url = blosc2.URLPath("@public/examples/ds-1d.b2nd", "https://cat2.cloud/demo")
estore["/node4"] = blosc2.open(
url, mode="r"
) # remote C2Array (stored as a lightweight reference)
print(list(estore.keys()))
# ['/node1', '/node2', '/node3', '/node4']
Note
Embedded arrays (NumPy, NDArray, and SChunk) increase the size of the
.b2e
container.Remote
C2Array
nodes only store lightweight references; reading them requires access to the remote source. NDArrays coming from external.b2nd
files are embedded into the store.When retrieving,
estore[key]
may return either anNDArray
or anSChunk
depending on what was originally stored; deserialization usesblosc2.from_cframe()
.
- class blosc2.EmbedStore(urlpath: str | None = None, mode: str = 'a', cparams: CParams | None = None, dparams: CParams | None = None, storage: Storage | None = None, chunksize: int | None = 65536, _from_schunk: SChunk | None = None)[source]¶
A dictionary-like container for storing NumPy/Blosc2 arrays (NDArray or SChunk) as nodes.
For NumPy arrays, Blosc2 NDArrays (even if they live in external
.b2nd
files), and Blosc2 SChunk objects, the data is read and embedded into the store. For remote arrays (C2Array
), only lightweight references (URL base and path) are stored. If you need a richer hierarchical container with optional external references, consider using blosc2.TreeStore or blosc2.DictStore.- Parameters:
urlpath¶ (str or None, optional) – Path for persistent storage. Using a ‘ .b2e’ extension is recommended. If None, the embed store will be in memory only, which can be deserialized later using the
blosc2.from_cframe()
function.mode¶ (str, optional) – File mode (‘r’, ‘w’, ‘a’). Default is ‘w’.
cparams¶ (dict or None, optional) – Compression parameters for nodes and the embed store itself. Default is None, which uses the default Blosc2 parameters.
dparams¶ (dict or None, optional) – Decompression parameters for nodes and the embed store itself. Default is None, which uses the default Blosc2 parameters.
storage¶ (blosc2.Storage or None, optional) – Storage properties for the embed store. If passed, it will override the urlpath and mode parameters.
chunksize¶ (int, optional) – Size of chunks for the backing storage. Default is 1 MiB.
Examples
>>> estore = EmbedStore(urlpath="example_estore.b2e", mode="w") >>> estore["/node1"] = np.array([1, 2, 3]) >>> estore["/node2"] = blosc2.ones(2) >>> estore["/node3"] = blosc2.arange(3, dtype="i4", urlpath="external_node3.b2nd", mode="w") >>> urlpath = blosc2.URLPath("@public/examples/ds-1d.b2nd", "https://cat2.cloud/demo") >>> estore["/node4"] = blosc2.open(urlpath, mode="r") >>> print(list(estore.keys())) ['/node1', '/node2', '/node3', '/node4'] >>> print(estore["/node1"][:]) [1 2 3]
Notes
The EmbedStore is still experimental and subject to change. Please report any issues you may find.
Methods
get
(key[, default])Retrieve a node, or default if not found.
items
()Iterate over (key, value) pairs.
keys
()Return all keys.
Serialize embed store to CFrame format.
values
()Iterate over all values.
- Special Methods:
__init__
([urlpath, mode, cparams, dparams, ...])Initialize EmbedStore.
__getitem__
(key)Retrieve a node from the embed store.
__setitem__
(key, value)Add a node to the embed store.
__delitem__
(key)Remove a node from the embed store.
__contains__
(key)Check if a key exists.
__len__
()Return number of nodes.
__iter__
()Iterate over keys.
Constructors¶
- __init__(urlpath: str | None = None, mode: str = 'a', cparams: CParams | None = None, dparams: CParams | None = None, storage: Storage | None = None, chunksize: int | None = 65536, _from_schunk: SChunk | None = None)[source]¶
Initialize EmbedStore.
- estore_from_cframe(cframe: bytes, copy: bool = False) EmbedStore ¶
Deserialize a CFrame to an EmbedStore object.
- Parameters:
- Returns:
estore – The deserialized EmbedStore object.
- Return type:
Dictionary Interface¶
Public Members¶