LazyArray

This is an API interface for computing an expression or a Python user defined function.

You can get an object following the LazyArray API with any of the following ways:

  • Any expression that involves one or more NDArray objects. e.g. a + b, where a and b are NDArray objects (see this tutorial).

  • Using the lazyexpr constructor.

  • Using the lazyudf constructor (see a tutorial).

The LazyArray object is a thin wrapper around the expression or user-defined function that allows for lazy computation. This means that the expression is not computed until the compute or __getitem__ methods are called. The compute method will return a new NDArray object with the result of the expression evaluation. The __getitem__ method will return an NumPy object instead.

See the LazyExpr and LazyUDF sections for more information.

class blosc2.LazyArray[source]
Attributes:
dtype

Get the data type of the LazyArray.

info

Get information about the LazyArray.

ndim

Get the number of dimensions of the LazyArray.

shape

Get the shape of the LazyArray.

Methods

compute([item])

Return an NDArray containing the evaluation of the LazyArray.

indices([order])

Return an LazyArray containing the indices where self is True.

save(**kwargs)

Save the LazyArray on disk.

sort([order])

Return a sorted LazyArray.

to_cframe()

Compute LazyArray and convert to cframe.

Special Methods:

__getitem__(item)

Return a NumPy.ndarray containing the evaluation of the LazyArray.

Methods

abstract __getitem__(item: int | slice | Sequence[slice]) blosc2.NDArray[source]

Return a NumPy.ndarray containing the evaluation of the LazyArray.

Parameters:

item (int, slice or sequence of slices) – If provided, item is used to slice the operands prior to computation; not to retrieve specified slices of the evaluated result. This difference between slicing operands and slicing the final expression is important when reductions or a where clause are used in the expression.

Returns:

out – An array with the data containing the evaluated slice.

Return type:

np.ndarray

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [30, 4]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)
>>> #  Convert numpy arrays to Blosc2 arrays
>>> a1 = blosc2.asarray(a)
>>> b1 = blosc2.asarray(b)
>>> # Perform the mathematical operation
>>> expr = a1 + b1  # LazyExpr expression
>>> expr[3]
[2.01680672 2.18487395 2.35294118 2.5210084 ]
>>> expr[2:4]
[[1.34453782 1.51260504 1.68067227 1.8487395 ]
[2.01680672 2.18487395 2.35294118 2.5210084 ]]
abstract compute(item: slice | list[slice] | None = None, **kwargs: Any) NDArray[source]

Return an NDArray containing the evaluation of the LazyArray.

Parameters:
  • item (slice, list of slices, optional) – If provided, item is used to slice the operands prior to computation; not to retrieve specified slices of the evaluated result. This difference between slicing operands and slicing the final expression is important when reductions or a where clause are used in the expression.

  • kwargs (Any, optional) – Keyword arguments that are supported by the empty() constructor. These arguments will be set in the resulting NDArray.

Returns:

out – A NDArray containing the result of evaluating the LazyUDF or LazyExpr.

Return type:

NDArray

Notes

  • If self is a LazyArray from an udf, the kwargs used to store the resulting array will be the ones passed to the constructor in lazyudf() (except the urlpath) updated with the kwargs passed when calling this method.

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [3, 3]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> #  Convert numpy arrays to Blosc2 arrays
>>> a1 = blosc2.asarray(a)
>>> b1 = blosc2.asarray(b)
>>> # Perform the mathematical operation
>>> expr = a1 + b1
>>> output = expr.compute()
>>> f"Result of a + b (lazy evaluation): {output[:]}"
Result of a + b (lazy evaluation):
            [[ 0.    1.25  2.5 ]
            [ 3.75  5.    6.25]
            [ 7.5   8.75 10.  ]]
abstract indices(order: str | list[str] | None = None) LazyArray[source]

Return an LazyArray containing the indices where self is True.

The LazyArray must be of bool dtype (e.g. a condition).

Parameters:

order (str, list of str, optional) – Specifies which fields to compare first, second, etc. A single field can be specified as a string. Not all fields need to be specified, only the ones by which the array is to be sorted.

Returns:

out – The indices of the LazyArray self that are True.

Return type:

LazyArray

abstract save(**kwargs: Any) None[source]

Save the LazyArray on disk.

Parameters:

kwargs (Any, optional) – Keyword arguments that are supported by the empty() constructor. The urlpath must always be provided.

Returns:

out

Return type:

None

Notes

  • All the operands of the LazyArray must be Python scalars, NDArray, C2Array or Proxy.

  • If an operand is a Proxy, keep in mind that Python-Blosc2 will only be able to reopen it as such if its source is a SChunk, NDArray or a C2Array (see blosc2.open() notes section for more info).

  • This is currently only supported for LazyExpr.

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [3, 3]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> # Define file paths for storing the arrays
>>> a1 = blosc2.asarray(a, urlpath='a_array.b2nd', mode='w')
>>> b1 = blosc2.asarray(b, urlpath='b_array.b2nd', mode='w')
>>> # Perform the mathematical operation to create a LazyExpr expression
>>> expr = a1 + b1
>>> # Save the LazyExpr to disk
>>> expr.save(urlpath='lazy_array.b2nd', mode='w')
>>> # Open and load the LazyExpr from disk
>>> disk_expr = blosc2.open('lazy_array.b2nd')
>>> disk_expr[:2]
[[0.   1.25 2.5 ]
[3.75 5.   6.25]]
abstract sort(order: str | list[str] | None = None) LazyArray[source]

Return a sorted LazyArray.

This is only valid for LazyArrays with structured dtypes.

Parameters:

order (str, list of str, optional) – Specifies which fields to compare first, second, etc. A single field can be specified as a string. Not all fields need to be specified, only the ones by which the array is to be sorted.

Returns:

out – A sorted LazyArray.

Return type:

LazyArray

to_cframe() bytes[source]

Compute LazyArray and convert to cframe.

Returns:

out – The buffer containing the serialized NDArray instance.

Return type:

bytes

abstract property dtype: dtype

Get the data type of the LazyArray.

Returns:

out – The data type of the LazyArray.

Return type:

np.dtype

abstract property info: InfoReporter

Get information about the LazyArray.

Returns:

out – A printable class with information about the LazyArray.

Return type:

InfoReporter

abstract property ndim: int

Get the number of dimensions of the LazyArray.

Returns:

out – The number of dimensions of the LazyArray.

Return type:

int

abstract property shape: tuple[int]

Get the shape of the LazyArray.

Returns:

out – The shape of the LazyArray.

Return type:

tuple

LazyExpr

An expression like a + sum(b), where there is at least one NDArray object in operands a and b, returns a LazyExpr object. You can also get a LazyExpr object using the lazyexpr constructor (see below).

This object follows the LazyArray API for computation and storage.

blosc2.lazyexpr(expression: str | bytes | LazyExpr | NDArray, operands: dict | None = None, out: NDArray | ndarray = None, where: tuple | list | None = None, local_dict: dict | None = None, global_dict: dict | None = None, ne_args: dict | None = None, _frame_depth: int = 2) LazyExpr[source]

Get a LazyExpr from an expression.

Parameters:
  • expression (str or bytes or LazyExpr) – The expression to evaluate. This can be any valid expression that can be ingested by numexpr. If a LazyExpr is passed, the expression will be updated with the new operands.

  • operands (dict) – The dictionary with operands. Supported values are NumPy.ndarray, Python scalars, NDArray, NDField or C2Array instances. If None, the operands will be seeked in the local and global dictionaries.

  • out (NDArray or np.ndarray, optional) – The output array where the result will be stored. If not provided, a new NumPy array will be created and returned.

  • where (tuple, list, optional) – A sequence of arguments for the where clause in the expression.

  • local_dict (dict, optional) – The local dictionary to use when looking for operands in the expression. If not provided, the local dictionary of the caller will be used.

  • global_dict (dict, optional) – The global dictionary to use when looking for operands in the expression. If not provided, the global dictionary of the caller will be used.

  • ne_args (dict, optional) – Additional arguments to be passed to numexpr.evaluate() function.

  • _frame_depth (int, optional) – The depth of the frame to use when looking for operands in the expression. The default value is 2.

Returns:

out – A LazyExpr is returned.

Return type:

LazyExpr

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [3, 3]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> a1 = blosc2.asarray(a)
>>> a1[:]
[[0.    0.625 1.25 ]
[1.875 2.5   3.125]
[3.75  4.375 5.   ]]
>>> b1 = blosc2.asarray(b)
>>> expr = 'a * b + 2'
>>> operands = { 'a': a1, 'b': b1 }
>>> lazy_expr = blosc2.lazyexpr(expr, operands=operands)
>>> f"Lazy expression created: {lazy_expr}"
Lazy expression created: a * b + 2
>>> lazy_expr[:]
[[ 2.        2.390625  3.5625  ]
[ 5.515625  8.25     11.765625]
[16.0625   21.140625 27.      ]]

LazyUDF

For getting a LazyUDF object (which is LazyArray-compliant) from a user-defined Python function, you can use the lazyudf constructor below. See a tutorial on how this works.

This object follows the LazyArray API for computation, although storage is not supported yet.

blosc2.lazyudf(func: Callable[[tuple, np.ndarray, tuple[int]], None], inputs: tuple | list | None, dtype: np.dtype, shape: tuple | list | None = None, chunked_eval: bool = True, **kwargs: Any) LazyUDF[source]

Get a LazyUDF from a python user-defined function.

Parameters:
  • func (Python function) – The user-defined function to apply to each block. This function will always receive the following parameters: - inputs_tuple: A tuple containing the corresponding slice for the block of each input in inputs. - output: The buffer to be filled as a multidimensional numpy.ndarray. - offset: The multidimensional offset corresponding to the start of the block being computed.

  • inputs (tuple or list or None) – The sequence of inputs. Supported inputs are: NumPy.ndarray, NDArray, NDField, C2Array. Any other object is supported too, and will be passed as is to the user-defined function. If not needed, this can be empty, but shape must be provided.

  • dtype (np.dtype) – The resulting ndarray dtype in NumPy format.

  • shape (tuple, optional) – The shape of the resulting array. If None, the shape will be guessed from inputs.

  • chunked_eval (bool, optional) – Whether to evaluate the function in chunks or not (blocks).

  • kwargs (Any, optional) – Keyword arguments that are supported by the empty() constructor. These arguments will be used by the LazyArray.__getitem__() and LazyArray.compute() methods. The last one will ignore the urlpath parameter passed in this function.

Returns:

out – A LazyUDF is returned.

Return type:

LazyUDF

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [3, 3]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(10, 20, num=size, dtype=dtype).reshape(shape)
>>> a1 = blosc2.asarray(a)
>>> b1 = blosc2.asarray(b)
>>> # Define a user-defined function that will be applied to each block of data
>>> def my_function(inputs_tuple, output, offset):
>>>     a, b = inputs_tuple
>>>     output[:] = a + b
>>> # Create a LazyUDF object using the user-defined function
>>> lazy_udf = blosc2.lazyudf(my_function, [a1, b1], dtype)
>>> type(lazy_udf)
<class 'blosc2.lazyexpr.LazyUDF'>
>>> f"Result of LazyUDF evaluation: {lazy_udf[:]}"
Result of LazyUDF evaluation:
        [[10.  12.5 15. ]
        [17.5 20.  22.5]
        [25.  27.5 30. ]]