LazyArray

This is an API interface for computing an expression or a Python user defined function.

You can get an object following the LazyArray API with any of the following ways:

  • Any expression that involves one or more NDArray objects. e.g. a + b, where a and b are NDArray objects (see this tutorial).

  • Using the lazyexpr constructor.

  • Using the lazyudf constructor (see a tutorial).

The LazyArray object is a thin wrapper around the expression or user-defined function that allows for lazy computation. This means that the expression is not computed until the compute or __getitem__ methods are called. The compute method will return a new NDArray object with the result of the expression evaluation. The __getitem__ method will return an NumPy object instead.

See the LazyExpr and LazyUDF sections for more information.

class blosc2.LazyArray[source]
Attributes:
device

Hardware device the array data resides on.

dtype

Get the data type of the Operand.

info

Get information about the Operand.

ndim

Get the number of dimensions of the Operand.

shape

Get the shape of the Operand.

Methods

all([axis, keepdims])

Test whether all array elements along a given axis evaluate to True.

any([axis, keepdims])

Test whether any array element along a given axis evaluates to True.

compute([item])

Return a NDArray containing the evaluation of the LazyArray.

indices([order])

Return an LazyArray containing the indices where self is True.

item()

Copy an element of an array to a standard Python scalar and return it.

max([axis, keepdims])

Return the maximum along a given axis.

mean([axis, dtype, keepdims])

Return the arithmetic mean along the specified axis.

min([axis, keepdims])

Return the minimum along a given axis.

prod([axis, dtype, keepdims])

Return the product of array elements over a given axis.

save(**kwargs)

Save the LazyArray on disk.

sort([order])

Return a sorted LazyArray.

std([axis, dtype, ddof, keepdims])

Return the standard deviation along the specified axis.

sum([axis, dtype, keepdims])

Return the sum of array elements over a given axis.

to_cframe()

Compute LazyArray and convert to cframe.

to_device(device)

Copy the array from the device on which it currently resides to the specified device.

var([axis, dtype, ddof, keepdims])

Return the variance along the specified axis.

where([value1, value2])

Select value1 or value2 values based on True/False for self.

Special Methods:

__getitem__(item)

Return a numpy.ndarray containing the evaluation of the LazyArray.

Methods

abstract __getitem__(item: int | slice | Sequence[slice]) np.ndarray[source]

Return a numpy.ndarray containing the evaluation of the LazyArray.

Parameters:

item (int, slice or sequence of slices) – If provided, item is used to slice the operands prior to computation; not to retrieve specified slices of the evaluated result. This difference between slicing operands and slicing the final expression is important when reductions or a where clause are used in the expression.

Returns:

out – An array with the data containing the evaluated slice.

Return type:

np.ndarray

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [30, 4]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)
>>> #  Convert numpy arrays to Blosc2 arrays
>>> a1 = blosc2.asarray(a)
>>> b1 = blosc2.asarray(b)
>>> # Perform the mathematical operation
>>> expr = a1 + b1  # LazyExpr expression
>>> expr[3]
[2.01680672 2.18487395 2.35294118 2.5210084 ]
>>> expr[2:4]
[[1.34453782 1.51260504 1.68067227 1.8487395 ]
[2.01680672 2.18487395 2.35294118 2.5210084 ]]
all(axis=None, keepdims=False, **kwargs)[source]

Test whether all array elements along a given axis evaluate to True.

The parameters are documented in the min.

Returns:

all_along_axis – The result of the evaluation along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.all

Examples

>>> import numpy as np
>>> import blosc2
>>> data = np.array([True, True, False, True, True, True])
>>> ndarray = blosc2.asarray(data)
>>> # Test if all elements are True along the default axis (flattened array)
>>> result_flat = blosc2.all(ndarray)
>>> print("All elements are True (flattened):", result_flat)
All elements are True (flattened): False
any(axis=None, keepdims=False, **kwargs)[source]

Test whether any array element along a given axis evaluates to True.

The parameters are documented in the min.

Returns:

any_along_axis – The result of the evaluation along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.any

Examples

>>> import blosc2
>>> import numpy as np
>>> data = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 0]])
>>> # Convert the NumPy array to a Blosc2 NDArray
>>> ndarray = blosc2.asarray(data)
>>> print("NDArray data:", ndarray[:])
NDArray data: [[1 0 0]
                [0 1 0]
                [0 0 0]]
>>> any_along_axis_0 = blosc2.any(ndarray, axis=0)
>>> print("Any along axis 0:", any_along_axis_0)
Any along axis 0: [True True False]
>>> any_flattened = blosc2.any(ndarray)
>>> print("Any in the flattened array:", any_flattened)
Any in the flattened array: True
abstract compute(item: slice | list[slice] | None = None, **kwargs: Any) NDArray[source]

Return a NDArray containing the evaluation of the LazyArray.

Parameters:
  • item (slice, list of slices, optional) – If provided, item is used to slice the operands prior to computation; not to retrieve specified slices of the evaluated result. This difference between slicing operands and slicing the final expression is important when reductions or a where clause are used in the expression.

  • kwargs (Any, optional) – Keyword arguments that are supported by the empty() constructor. These arguments will be set in the resulting NDArray.

Returns:

out – A NDArray containing the result of evaluating the LazyUDF or LazyExpr.

Return type:

NDArray

Notes

  • If self is a LazyArray from an udf, the kwargs used to store the resulting array will be the ones passed to the constructor in lazyudf() (except the urlpath) updated with the kwargs passed when calling this method.

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [3, 3]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> #  Convert numpy arrays to Blosc2 arrays
>>> a1 = blosc2.asarray(a)
>>> b1 = blosc2.asarray(b)
>>> # Perform the mathematical operation
>>> expr = a1 + b1
>>> output = expr.compute()
>>> f"Result of a + b (lazy evaluation): {output[:]}"
Result of a + b (lazy evaluation):
            [[ 0.    1.25  2.5 ]
            [ 3.75  5.    6.25]
            [ 7.5   8.75 10.  ]]
abstract indices(order: str | list[str] | None = None) LazyArray[source]

Return an LazyArray containing the indices where self is True.

The LazyArray must be of bool dtype (e.g. a condition).

Parameters:

order (str, list of str, optional) – Specifies which fields to compare first, second, etc. A single field can be specified as a string. Not all fields need to be specified, only the ones by which the array is to be sorted.

Returns:

out – The indices of the LazyArray self that are True.

Return type:

LazyArray

item() float | bool | complex | int[source]

Copy an element of an array to a standard Python scalar and return it.

max(axis=None, keepdims=False, **kwargs)[source]

Return the maximum along a given axis.

The parameters are documented in the min.

Returns:

max_along_axis – The maximum of the elements along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.max

Examples

>>> import blosc2
>>> import numpy as np
>>> data = np.array([[11, 2, 36, 24, 5, 69], [73, 81, 49, 6, 73, 0]])
>>> ndarray = blosc2.asarray(data)
>>> print("NDArray data:", ndarray[:])
NDArray data:  [[11  2 36 24  5 69]
                [73 81 49  6 73  0]]
>>> # Compute the maximum along axis 0 and 1
>>> max_along_axis_0 = blosc2.max(ndarray, axis=0)
>>> print("Maximum along axis 0:", max_along_axis_0)
Maximum along axis 0: [73 81 49 24 73 69]
>>> max_along_axis_1 = blosc2.max(ndarray, axis=1)
>>> print("Maximum along axis 1:", max_along_axis_1)
Maximum along axis 1: [69 81]
>>> max_flattened = blosc2.max(ndarray)
>>> print("Maximum of the flattened array:", max_flattened)
Maximum of the flattened array: 81
mean(axis=None, dtype=None, keepdims=False, **kwargs)[source]

Return the arithmetic mean along the specified axis.

The parameters are documented in the sum.

Returns:

mean_along_axis – The mean of the elements along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.mean

Examples

>>> import numpy as np
>>> import blosc2
>>> # Example array
>>> array = np.array([[1, 2, 3], [4, 5, 6]]
>>> nd_array = blosc2.asarray(array)
>>> # Compute the mean of all elements in the array (axis=None)
>>> overall_mean = blosc2.mean(nd_array)
>>> print("Mean of all elements:", overall_mean)
Mean of all elements: 3.5
min(axis=None, keepdims=False, **kwargs)[source]

Return the minimum along a given axis.

Parameters:
  • ndarr (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.

  • axis (int or tuple of ints, optional) – Axis or axes along which to operate. By default, flattened input is used.

  • keepdims (bool, optional) – If set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

  • kwargs (dict, optional) – Keyword arguments that are supported by the empty() constructor.

Returns:

min_along_axis – The minimum of the elements along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.min

Examples

>>> import numpy as np
>>> import blosc2
>>> array = np.array([1, 3, 7, 8, 9, 31])
>>> nd_array = blosc2.asarray(array)
>>> min_all = blosc2.min(nd_array)
>>> print("Minimum of all elements in the array:", min_all)
Minimum of all elements in the array: 1
>>> # Compute the minimum along axis 0 with keepdims=True
>>> min_keepdims = blosc2.min(nd_array, axis=0, keepdims=True)
>>> print("Minimum along axis 0 with keepdims=True:", min_keepdims)
Minimum along axis 0 with keepdims=True:  [1]
prod(axis=None, dtype=None, keepdims=False, **kwargs)[source]

Return the product of array elements over a given axis.

The parameters are documented in the sum.

Returns:

product_along_axis – The product of the elements along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.prod

Examples

>>> import numpy as np
>>> import blosc2
>>> # Create an instance of NDArray with some data
>>> array = np.array([[11, 22, 33], [4, 15, 36]])
>>> nd_array = blosc2.asarray(array)
>>> # Compute the product of all elements in the array
>>> prod_all = blosc2.prod(nd_array)
>>> print("Product of all elements in the array:", prod_all)
Product of all elements in the array: 17249760
>>> # Compute the product along axis 1 (rows)
>>> prod_axis1 = blosc2.prod(nd_array, axis=1)
>>> print("Product along axis 1:", prod_axis1)
Product along axis 1: [7986 2160]
abstract save(**kwargs: Any) None[source]

Save the LazyArray on disk.

Parameters:

kwargs (Any, optional) – Keyword arguments that are supported by the empty() constructor. The urlpath must always be provided.

Returns:

out

Return type:

None

Notes

  • All the operands of the LazyArray must be Python scalars, or blosc2.Array objects.

  • If an operand is a Proxy, keep in mind that Python-Blosc2 will only be able to reopen it as such if its source is a SChunk, NDArray or a C2Array (see blosc2.open() notes section for more info).

  • This is currently only supported for LazyExpr.

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [3, 3]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> # Define file paths for storing the arrays
>>> a1 = blosc2.asarray(a, urlpath='a_array.b2nd', mode='w')
>>> b1 = blosc2.asarray(b, urlpath='b_array.b2nd', mode='w')
>>> # Perform the mathematical operation to create a LazyExpr expression
>>> expr = a1 + b1
>>> # Save the LazyExpr to disk
>>> expr.save(urlpath='lazy_array.b2nd', mode='w')
>>> # Open and load the LazyExpr from disk
>>> disk_expr = blosc2.open('lazy_array.b2nd')
>>> disk_expr[:2]
[[0.   1.25 2.5 ]
[3.75 5.   6.25]]
abstract sort(order: str | list[str] | None = None) LazyArray[source]

Return a sorted LazyArray.

This is only valid for LazyArrays with structured dtypes.

Parameters:

order (str, list of str, optional) – Specifies which fields to compare first, second, etc. A single field can be specified as a string. Not all fields need to be specified, only the ones by which the array is to be sorted.

Returns:

out – A sorted LazyArray.

Return type:

LazyArray

std(axis=None, dtype=None, ddof=0, keepdims=False, **kwargs)[source]

Return the standard deviation along the specified axis.

Parameters:
  • ndarr (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.

  • axis (int or tuple of ints, optional) – Axis or axes along which the standard deviation is computed. By default, axis=None computes the standard deviation of the flattened array.

  • dtype (np.dtype or list str, optional) – Type to use in computing the standard deviation. For integer inputs, the default is float32; for floating point inputs, it is the same as the input dtype.

  • ddof (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default, ddof is zero.

  • keepdims (bool, optional) – If set to True, the reduced axes are left in the result as dimensions with size one. This ensures that the result will broadcast correctly against the input array.

  • kwargs (dict, optional) – Additional keyword arguments that are supported by the empty() constructor.

Returns:

std_along_axis – The standard deviation of the elements along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.std

Examples

>>> import numpy as np
>>> import blosc2
>>> # Create an instance of NDArray with some data
>>> array = np.array([[1, 2, 3], [4, 5, 6]])
>>> nd_array = blosc2.asarray(array)
>>> # Compute the standard deviation of the entire array
>>> std_all = blosc2.std(nd_array)
>>> print("Standard deviation of the entire array:", std_all)
Standard deviation of the entire array: 1.707825127659933
>>> # Compute the standard deviation along axis 0 (columns)
>>> std_axis0 = blosc2.std(nd_array, axis=0)
>>> print("Standard deviation along axis 0:", std_axis0)
Standard deviation along axis 0: [1.5 1.5 1.5]
sum(axis=None, dtype=None, keepdims=False, **kwargs)[source]

Return the sum of array elements over a given axis.

Parameters:
  • ndarr (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.

  • axis (int or tuple of ints, optional) – Axis or axes along which a sum is performed. By default, axis=None, sums all the elements of the input array. If axis is negative, it counts from the last to the first axis.

  • dtype (np.dtype or list str, optional) – The type of the returned array and of the accumulator in which the elements are summed. The dtype of ndarr is used by default unless it has an integer dtype of less precision than the default platform integer.

  • keepdims (bool, optional) – If set to True, the reduced axes are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

  • kwargs (dict, optional) – Additional keyword arguments supported by the empty() constructor.

Returns:

sum_along_axis – The sum of the elements along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.sum

Examples

>>> import numpy as np
>>> import blosc2
>>> # Example array
>>> array = np.array([[1, 2, 3], [4, 5, 6]])
>>> nd_array = blosc2.asarray(array)
>>> # Sum all elements in the array (axis=None)
>>> total_sum = blosc2.sum(nd_array)
>>> print("Sum of all elements:", total_sum)
21
>>> # Sum along axis 0 (columns)
>>> sum_axis_0 = blosc2.sum(nd_array, axis=0)
>>> print("Sum along axis 0 (columns):", sum_axis_0)
Sum along axis 0 (columns): [5 7 9]
to_cframe() bytes[source]

Compute LazyArray and convert to cframe.

Returns:

out – The buffer containing the serialized NDArray instance.

Return type:

bytes

to_device(device: str)[source]

Copy the array from the device on which it currently resides to the specified device.

Parameters:
  • self (NDArray) – Array instance.

  • device (str) – Device to move array object to. Returns error except when device==’cpu’.

Returns:

out – If device=’cpu’, the same array; else raises an Error.

Return type:

NDArray

var(axis=None, dtype=None, ddof=0, keepdims=False, **kwargs)[source]

Return the variance along the specified axis.

The parameters are documented in the std.

Returns:

var_along_axis – The variance of the elements along the axis.

Return type:

np.ndarray or NDArray or scalar

References

np.var

Examples

>>> import numpy as np
>>> import blosc2
>>> # Create an instance of NDArray with some data
>>> array = np.array([[1, 2, 3], [4, 5, 6]])
>>> nd_array = blosc2.asarray(array)
>>> # Compute the variance of the entire array
>>> var_all = blosc2.var(nd_array)
>>> print("Variance of the entire array:", var_all)
Variance of the entire array: 2.9166666666666665
>>> # Compute the variance along axis 0 (columns)
>>> var_axis0 = blosc2.var(nd_array, axis=0)
>>> print("Variance along axis 0:", var_axis0)
Variance along axis 0: [2.25 2.25 2.25]
where(value1=None, value2=None)[source]

Select value1 or value2 values based on True/False for self.

Parameters:
  • value1 (array_like, optional) – The value to select when element of self is True.

  • value2 (array_like, optional) – The value to select when element of self is False.

Returns:

out – A new expression with the where condition applied.

Return type:

LazyExpr

property device

Hardware device the array data resides on. Always equal to ‘cpu’.

abstract property dtype: dtype

Get the data type of the Operand.

Returns:

out – The data type of the Operand.

Return type:

np.dtype

abstract property info: InfoReporter

Get information about the Operand.

Returns:

out – A printable class with information about the Operand.

Return type:

InfoReporter

abstract property ndim: int

Get the number of dimensions of the Operand.

Returns:

out – The number of dimensions of the Operand.

Return type:

int

abstract property shape: tuple[int]

Get the shape of the Operand.

Returns:

out – The shape of the Operand.

Return type:

tuple

LazyExpr

An expression like a + sum(b), where there is at least one NDArray object in operands a and b, returns a LazyExpr object. You can also get a LazyExpr object using the lazyexpr constructor (see below).

This object follows the LazyArray API for computation and storage.

blosc2.lazyexpr(expression: str | bytes | LazyArray | NDArray, operands: dict | None = None, out: Array = None, where: tuple | list | None = None, local_dict: dict | None = None, global_dict: dict | None = None, ne_args: dict | None = None, _frame_depth: int = 2) LazyExpr[source]

Get a LazyExpr from an expression.

Parameters:
  • expression (str or bytes or LazyExpr or NDArray) – The expression to evaluate. This can be any valid expression that numexpr can ingest. If a LazyExpr is passed, the expression will be updated with the new operands.

  • operands (dict[blosc2.Array], optional) – The dictionary with operands. Supported values are Python scalars, or any instance that is blosc2.Array compliant. If None, the operands will be seeked in the local and global dictionaries.

  • out (blosc2.Array, optional) – The output array where the result will be stored. If not provided, a new NumPy array will be created and returned.

  • where (tuple, list, optional) – A sequence of arguments for the where clause in the expression.

  • local_dict (dict, optional) – The local dictionary to use when looking for operands in the expression. If not provided, the local dictionary of the caller will be used.

  • global_dict (dict, optional) – The global dictionary to use when looking for operands in the expression. If not provided, the global dictionary of the caller will be used.

  • ne_args (dict, optional) – Additional arguments to be passed to numexpr.evaluate() function.

  • _frame_depth (int, optional) – The depth of the frame to use when looking for operands in the expression. The default value is 2.

Returns:

out – A LazyExpr is returned.

Return type:

LazyExpr

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [3, 3]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape)
>>> a1 = blosc2.asarray(a)
>>> a1[:]
[[0.    0.625 1.25 ]
[1.875 2.5   3.125]
[3.75  4.375 5.   ]]
>>> b1 = blosc2.asarray(b)
>>> expr = 'a * b + 2'
>>> operands = { 'a': a1, 'b': b1 }
>>> lazy_expr = blosc2.lazyexpr(expr, operands=operands)
>>> f"Lazy expression created: {lazy_expr}"
Lazy expression created: a * b + 2
>>> lazy_expr[:]
[[ 2.        2.390625  3.5625  ]
[ 5.515625  8.25     11.765625]
[16.0625   21.140625 27.      ]]

LazyUDF

For getting a LazyUDF object (which is LazyArray-compliant) from a user-defined Python function, you can use the lazyudf constructor below. See a tutorial on how this works.

This object follows the LazyArray API for computation, although storage is not supported yet.

blosc2.lazyudf(func: Callable[[tuple, np.ndarray, tuple[int]], None], inputs: Sequence[Any] | None, dtype: np.dtype, shape: tuple | list | None = None, chunked_eval: bool = True, **kwargs: Any) LazyUDF[source]

Get a LazyUDF from a python user-defined function.

Parameters:
  • func (Python function) – The user-defined function to apply to each block. This function will always receive the following parameters: - inputs_tuple: A tuple containing the corresponding slice for the block of each input in inputs. - output: The buffer to be filled as a multidimensional numpy.ndarray. - offset: The multidimensional offset corresponding to the start of the block being computed.

  • inputs (Sequence[Any] or None) – The sequence of inputs. Besides objects compliant with the blosc2.Array protocol, any other object is supported too, and it will be passed as-is to the user-defined function. If not needed, this can be empty, but shape must be provided.

  • dtype (np.dtype) – The resulting ndarray dtype in NumPy format.

  • shape (tuple, optional) – The shape of the resulting array. If None, the shape will be guessed from inputs.

  • chunked_eval (bool, optional) – Whether to evaluate the function in chunks or not (blocks).

  • kwargs (Any, optional) – Keyword arguments that are supported by the empty() constructor. These arguments will be used by the LazyArray.__getitem__() and LazyArray.compute() methods. The last one will ignore the urlpath parameter passed in this function.

Returns:

out – A LazyUDF is returned.

Return type:

LazyUDF

Examples

>>> import blosc2
>>> import numpy as np
>>> dtype = np.float64
>>> shape = [3, 3]
>>> size = shape[0] * shape[1]
>>> a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)
>>> b = np.linspace(10, 20, num=size, dtype=dtype).reshape(shape)
>>> a1 = blosc2.asarray(a)
>>> b1 = blosc2.asarray(b)
>>> # Define a user-defined function that will be applied to each block of data
>>> def my_function(inputs_tuple, output, offset):
>>>     a, b = inputs_tuple
>>>     output[:] = a + b
>>> # Create a LazyUDF object using the user-defined function
>>> lazy_udf = blosc2.lazyudf(my_function, [a1, b1], dtype)
>>> type(lazy_udf)
<class 'blosc2.lazyexpr.LazyUDF'>
>>> f"Result of LazyUDF evaluation: {lazy_udf[:]}"
Result of LazyUDF evaluation:
        [[10.  12.5 15. ]
        [17.5 20.  22.5]
        [25.  27.5 30. ]]