Proxy

Class that implements a proxy (with cache support) of a Python-Blosc2 container.

This can be used to cache chunks of regular data container which follows the ProxySource or ProxyNDSource interfaces.

class blosc2.Proxy(src: ProxySource, urlpath: str | None = None, mode='a', **kwargs: dict)[source]

Proxy (with cache support) for an object following the ProxySource interface.

This can be used to cache chunks of a regular data container which follows the ProxySource or ProxyNDSource interfaces.

Attributes:
blocks

The blocks of self or None if the data is not a Blosc2 NDArray

chunks

The chunks of self or None if the data is not a Blosc2 NDArray

cparams

The compression parameters of the cache

dtype

The dtype of self or None if the data is unidimensional

fields

Dictionary with the fields of self.

info

The info of the cache

schunk

The SChunk of the cache

shape

The shape of self

vlmeta

Get the vlmeta of the cache.

Methods

afetch([item])

Retrieve the cache container with the requested data updated asynchronously.

fetch([item])

Get the container used as cache with the requested data updated.

Special Methods:

__init__(src[, urlpath, mode])

Create a new Proxy to serve as a cache to save accessed chunks locally.

__getitem__(item)

Get a slice as a numpy.ndarray using the Proxy.

Constructor

__init__(src: ProxySource, urlpath: str | None = None, mode='a', **kwargs: dict)[source]

Create a new Proxy to serve as a cache to save accessed chunks locally.

Parameters:
  • src (ProxySource or ProxyNDSource) – The original container.

  • urlpath (str, optional) – The urlpath where to save the container that will work as a cache.

  • mode (str, optional) – “a” means read/write (create if it doesn’t exist); “w” means create (overwrite if it exists). Default is “a”.

  • kwargs (dict, optional) –

    Keyword arguments supported:

    vlmeta: dict or None
    A dictionary with different variable length metalayers. One entry per metalayer:
    key: bytes or str

    The name of the metalayer.

    value: object

    The metalayer object that will be serialized using msgpack.

Utility Methods

__getitem__(item: slice | list[slice]) ndarray[source]

Get a slice as a numpy.ndarray using the Proxy.

Parameters:

item (slice or list of slices) – The slice of the desired data.

Returns:

out – An array with the data slice.

Return type:

numpy.ndarray

Examples

>>> import numpy as np
>>> import blosc2
>>> data = np.arange(25).reshape(5, 5)
>>> ndarray = blosc2.asarray(data)
>>> proxy = blosc2.Proxy(ndarray)
>>> proxy[0:3, 0:3]
[[ 0  1  2]
[ 5  6  7]
[10 11 12]
[20 21 22]]
>>> proxy[2:5, 2:5]
[[12 13 14]
[17 18 19]
[22 23 24]]
async afetch(item: slice | list[slice] | None = None) NDArray | SChunk[source]

Retrieve the cache container with the requested data updated asynchronously.

Parameters:

item (slice or list of slices, optional) – If provided, only the chunks intersecting with the specified slices will be retrieved if they have not been already.

Returns:

out – The local container used to cache the already requested data.

Return type:

NDArray or SChunk

Notes

This method is only available if the ProxySource or ProxyNDSource have an async aget_chunk method.

Examples

>>> import numpy as np
>>> import blosc2
>>> import asyncio
>>> from blosc2 import ProxyNDSource
>>> class MyProxySource(ProxyNDSource):
>>>     def __init__(self, data):
>>>         # If the next source is multidimensional, it must have the attributes:
>>>         self.data = data
>>>         f"Data shape: {self.shape}, Chunks: {self.chunks}"
>>>         f"Blocks: {self.blocks}, Dtype: {self.dtype}"
>>>     @property
>>>     def shape(self):
>>>         return self.data.shape
>>>     @property
>>>     def chunks(self):
>>>         return self.data.chunks
>>>     @property
>>>     def blocks(self):
>>>         return self.data.blocks
>>>     @property
>>>     def dtype(self):
>>>         return self.data.dtype
>>>     # This method must be present
>>>     def get_chunk(self, nchunk):
>>>         return self.data.get_chunk(nchunk)
>>>     # This method is optional
>>>     async def aget_chunk(self, nchunk):
>>>         await asyncio.sleep(0.1) # Simulate an asynchronous operation
>>>         return self.data.get_chunk(nchunk)
>>> data = np.arange(20).reshape(4, 5)
>>> chunks = [2, 5]
>>> blocks = [1, 5]
>>> data = blosc2.asarray(data, chunks=chunks, blocks=blocks)
>>> source = MyProxySource(data)
>>> proxy = blosc2.Proxy(source)
>>> async def fetch_data():
>>>     # Fetch a slice of the data from the proxy asynchronously
>>>     slice_data = await proxy.afetch(slice(0, 2))
>>>     # Note that only data fetched is shown, the rest is uninitialized
>>>     slice_data[:]
>>> asyncio.run(fetch_data())
>>> # Using getitem to get a slice of the data
>>> result = proxy[1:2, 1:3]
>>> f"Proxy getitem: {result}"
Data shape: (4, 5), Chunks: (2, 5)
Blocks: (1, 5), Dtype: int64
[[0 1 2 3 4]
[5 6 7 8 9]
[0 0 0 0 0]
[0 0 0 0 0]]
Proxy getitem: [[6 7]]
fetch(item: slice | list[slice] | None = None) NDArray | SChunk[source]

Get the container used as cache with the requested data updated.

Parameters:

item (slice or list of slices, optional) – If not None, only the chunks that intersect with the slices in items will be retrieved if they have not been already.

Returns:

out – The local container used to cache the already requested data.

Return type:

NDArray or SChunk

Examples

>>> import numpy as np
>>> import blosc2
>>> data = np.arange(20).reshape(10, 2)
>>> ndarray = blosc2.asarray(data)
>>> proxy = blosc2.Proxy(ndarray)
>>> slice_data = proxy.fetch((slice(0, 3), slice(0, 2)))
>>> slice_data[:3, :2]
[[0 1]
[2 3]
[4 5]]
property blocks: tuple[int]

The blocks of self or None if the data is not a Blosc2 NDArray

property chunks: tuple[int]

The chunks of self or None if the data is not a Blosc2 NDArray

property cparams: CParams

The compression parameters of the cache

property dtype: dtype

The dtype of self or None if the data is unidimensional

property fields: dict

Dictionary with the fields of self.

Returns:

fields – A dictionary with the fields of the Proxy.

Return type:

dict

See also

NDField

Examples

>>> import numpy as np
>>> import blosc2
>>> data = np.ones(16, dtype=[('field1', 'i4'), ('field2', 'f4')]).reshape(4, 4)
>>> ndarray = blosc2.asarray(data)
>>> proxy = blosc2.Proxy(ndarray)
>>>  # Get a dictionary of fields from the proxy, where each field can be accessed individually
>>> fields_dict = proxy.fields
>>> for field_name, field_proxy in fields_dict.items():
>>>     print(f"Field name: {field_name}, Field data: {field_proxy}")
Field name: field1, Field data: <blosc2.proxy.ProxyNDField object at 0x114472d20>
Field name: field2, Field data: <blosc2.proxy.ProxyNDField object at 0x10e215be0>
>>> fields_dict['field2'][:]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
property info: str

The info of the cache

property schunk: SChunk

The SChunk of the cache

property shape: tuple[int]

The shape of self

property vlmeta: vlmeta

Get the vlmeta of the cache.

See also

blosc2.schunk.SChunk.vlmeta