Proxy¶
Class that implements a proxy (with cache support) of a Python-Blosc2 container.
This can be used to cache chunks of regular data container which follows the ProxySource or ProxyNDSource interfaces.
- class blosc2.Proxy(src: ProxySource, urlpath: str | None = None, mode='a', **kwargs: dict)[source]¶
Proxy (with cache support) for an object following the ProxySource interface.
This can be used to cache chunks of a regular data container which follows the ProxySource or ProxyNDSource interfaces.
- Attributes:
blocks
The blocks of
self
or None if the data is not a Blosc2 NDArraychunks
The chunks of
self
or None if the data is not a Blosc2 NDArraycparams
The compression parameters of the cache
device
Hardware device the array data resides on.
dtype
The dtype of
self
or None if the data is unidimensionalfields
Dictionary with the fields of
self
.info
The info of the cache
ndim
Get the number of dimensions of the Operand.
schunk
The SChunk of the cache
shape
The shape of
self
vlmeta
Get the vlmeta of the cache.
Methods
afetch
([item])Retrieve the cache container with the requested data updated asynchronously.
fetch
([item])Get the container used as cache with the requested data updated.
item
()Copy an element of an array to a standard Python scalar and return it.
to_device
(device)Copy the array from the device on which it currently resides to the specified device.
where
([value1, value2])Select
value1
orvalue2
values based onTrue
/False
forself
.- Special Methods:
__init__
(src[, urlpath, mode])Create a new Proxy to serve as a cache to save accessed chunks locally.
__getitem__
(item)Get a slice as a numpy.ndarray using the Proxy.
Constructor¶
- __init__(src: ProxySource, urlpath: str | None = None, mode='a', **kwargs: dict)[source]¶
Create a new Proxy to serve as a cache to save accessed chunks locally.
- Parameters:
src¶ (ProxySource or ProxyNDSource) – The original container.
urlpath¶ (str, optional) – The urlpath where to save the container that will work as a cache.
mode¶ (str, optional) – “a” means read/write (create if it doesn’t exist); “w” means create (overwrite if it exists). Default is “a”.
kwargs¶ (dict, optional) –
Keyword arguments supported:
- vlmeta: dict or None
- A dictionary with different variable length metalayers. One entry per metalayer:
- key: bytes or str
The name of the metalayer.
- value: object
The metalayer object that will be serialized using msgpack.
Utility Methods¶
- __getitem__(item: slice | list[slice]) ndarray [source]¶
Get a slice as a numpy.ndarray using the Proxy.
- Parameters:
item¶ (slice or list of slices) – The slice of the desired data.
- Returns:
out – An array with the data slice.
- Return type:
numpy.ndarray
Examples
>>> import numpy as np >>> import blosc2 >>> data = np.arange(25).reshape(5, 5) >>> ndarray = blosc2.asarray(data) >>> proxy = blosc2.Proxy(ndarray) >>> proxy[0:3, 0:3] [[ 0 1 2] [ 5 6 7] [10 11 12] [20 21 22]] >>> proxy[2:5, 2:5] [[12 13 14] [17 18 19] [22 23 24]]
- async afetch(item: slice | list[slice] | None = ()) NDArray | SChunk [source]¶
Retrieve the cache container with the requested data updated asynchronously.
- Parameters:
item¶ (slice or list of slices, optional) – If provided, only the chunks intersecting with the specified slices will be retrieved if they have not been already.
- Returns:
out – The local container used to cache the already requested data.
- Return type:
Notes
This method is only available if the ProxySource or ProxyNDSource have an async aget_chunk method.
Examples
>>> import numpy as np >>> import blosc2 >>> import asyncio >>> from blosc2 import ProxyNDSource >>> class MyProxySource(ProxyNDSource): >>> def __init__(self, data): >>> # If the next source is multidimensional, it must have the attributes: >>> self.data = data >>> f"Data shape: {self.shape}, Chunks: {self.chunks}" >>> f"Blocks: {self.blocks}, Dtype: {self.dtype}" >>> @property >>> def shape(self): >>> return self.data.shape >>> @property >>> def chunks(self): >>> return self.data.chunks >>> @property >>> def blocks(self): >>> return self.data.blocks >>> @property >>> def dtype(self): >>> return self.data.dtype >>> # This method must be present >>> def get_chunk(self, nchunk): >>> return self.data.get_chunk(nchunk) >>> # This method is optional >>> async def aget_chunk(self, nchunk): >>> await asyncio.sleep(0.1) # Simulate an asynchronous operation >>> return self.data.get_chunk(nchunk) >>> data = np.arange(20).reshape(4, 5) >>> chunks = [2, 5] >>> blocks = [1, 5] >>> data = blosc2.asarray(data, chunks=chunks, blocks=blocks) >>> source = MyProxySource(data) >>> proxy = blosc2.Proxy(source) >>> async def fetch_data(): >>> # Fetch a slice of the data from the proxy asynchronously >>> slice_data = await proxy.afetch(slice(0, 2)) >>> # Note that only data fetched is shown, the rest is uninitialized >>> slice_data[:] >>> asyncio.run(fetch_data()) >>> # Using getitem to get a slice of the data >>> result = proxy[1:2, 1:3] >>> f"Proxy getitem: {result}" Data shape: (4, 5), Chunks: (2, 5) Blocks: (1, 5), Dtype: int64 [[0 1 2 3 4] [5 6 7 8 9] [0 0 0 0 0] [0 0 0 0 0]] Proxy getitem: [[6 7]]
- fetch(item: slice | list[slice] | None = ()) NDArray | SChunk [source]¶
Get the container used as cache with the requested data updated.
- Parameters:
item¶ (slice or list of slices, optional) – If not None, only the chunks that intersect with the slices in items will be retrieved if they have not been already.
- Returns:
out – The local container used to cache the already requested data.
- Return type:
Examples
>>> import numpy as np >>> import blosc2 >>> data = np.arange(20).reshape(10, 2) >>> ndarray = blosc2.asarray(data) >>> proxy = blosc2.Proxy(ndarray) >>> slice_data = proxy.fetch((slice(0, 3), slice(0, 2))) >>> slice_data[:3, :2] [[0 1] [2 3] [4 5]]
- property fields: dict¶
Dictionary with the fields of
self
.- Returns:
fields – A dictionary with the fields of the Proxy.
- Return type:
dict
See also
Examples
>>> import numpy as np >>> import blosc2 >>> data = np.ones(16, dtype=[('field1', 'i4'), ('field2', 'f4')]).reshape(4, 4) >>> ndarray = blosc2.asarray(data) >>> proxy = blosc2.Proxy(ndarray) >>> # Get a dictionary of fields from the proxy, where each field can be accessed individually >>> fields_dict = proxy.fields >>> for field_name, field_proxy in fields_dict.items(): >>> print(f"Field name: {field_name}, Field data: {field_proxy}") Field name: field1, Field data: <blosc2.proxy.ProxyNDField object at 0x114472d20> Field name: field2, Field data: <blosc2.proxy.ProxyNDField object at 0x10e215be0> >>> fields_dict['field2'][:] [[1. 1. 1. 1.] [1. 1. 1. 1.] [1. 1. 1. 1.] [1. 1. 1. 1.]]
- property info: str¶
The info of the cache
- property vlmeta: vlmeta¶
Get the vlmeta of the cache.
See also
blosc2.schunk.SChunk.vlmeta