LazyArray¶
This is an API interface for computing an expression or a Python user defined function.
You can get an object following the LazyArray API with any of the following ways:
Any expression that involves one or more NDArray objects. e.g.
a + b
, wherea
andb
are NDArray objects (see this tutorial).Using the
lazyexpr
constructor.Using the
lazyudf
constructor (see a tutorial).
The LazyArray object is a thin wrapper around the expression or user-defined function that allows for lazy computation. This means that the expression is not computed until the compute
or __getitem__
methods are called. The compute
method will return a new NDArray object with the result of the expression evaluation. The __getitem__
method will return an NumPy object instead.
See the LazyExpr and LazyUDF sections for more information.
- class blosc2.LazyArray[source]¶
- Attributes:
Methods
all
([axis, keepdims])Test whether all array elements along a given axis evaluate to True.
any
([axis, keepdims])Test whether any array element along a given axis evaluates to True.
compute
([item])Return a NDArray containing the evaluation of the LazyArray.
indices
([order])Return an LazyArray containing the indices where self is True.
item
()Copy an element of an array to a standard Python scalar and return it.
max
([axis, keepdims])Return the maximum along a given axis.
mean
([axis, dtype, keepdims])Return the arithmetic mean along the specified axis.
min
([axis, keepdims])Return the minimum along a given axis.
prod
([axis, dtype, keepdims])Return the product of array elements over a given axis.
save
(**kwargs)Save the LazyArray on disk.
sort
([order])Return a sorted LazyArray.
std
([axis, dtype, ddof, keepdims])Return the standard deviation along the specified axis.
sum
([axis, dtype, keepdims])Return the sum of array elements over a given axis.
Compute LazyArray and convert to cframe.
to_device
(device)Copy the array from the device on which it currently resides to the specified device.
var
([axis, dtype, ddof, keepdims])Return the variance along the specified axis.
where
([value1, value2])Select
value1
orvalue2
values based onTrue
/False
forself
.- Special Methods:
__getitem__
(item)Return a numpy.ndarray containing the evaluation of the LazyArray.
Methods¶
- abstract __getitem__(item: int | slice | Sequence[slice]) np.ndarray [source]¶
Return a numpy.ndarray containing the evaluation of the LazyArray.
- Parameters:
item¶ (int, slice or sequence of slices) – If provided, item is used to slice the operands prior to computation; not to retrieve specified slices of the evaluated result. This difference between slicing operands and slicing the final expression is important when reductions or a where clause are used in the expression.
- Returns:
out – An array with the data containing the evaluated slice.
- Return type:
np.ndarray
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [30, 4] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape) >>> # Convert numpy arrays to Blosc2 arrays >>> a1 = blosc2.asarray(a) >>> b1 = blosc2.asarray(b) >>> # Perform the mathematical operation >>> expr = a1 + b1 # LazyExpr expression >>> expr[3] [2.01680672 2.18487395 2.35294118 2.5210084 ] >>> expr[2:4] [[1.34453782 1.51260504 1.68067227 1.8487395 ] [2.01680672 2.18487395 2.35294118 2.5210084 ]]
- all(axis=None, keepdims=False, **kwargs)[source]¶
Test whether all array elements along a given axis evaluate to True.
The parameters are documented in the
min
.- Returns:
all_along_axis – The result of the evaluation along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> data = np.array([True, True, False, True, True, True]) >>> ndarray = blosc2.asarray(data) >>> # Test if all elements are True along the default axis (flattened array) >>> result_flat = blosc2.all(ndarray) >>> print("All elements are True (flattened):", result_flat) All elements are True (flattened): False
- any(axis=None, keepdims=False, **kwargs)[source]¶
Test whether any array element along a given axis evaluates to True.
The parameters are documented in the
min
.- Returns:
any_along_axis – The result of the evaluation along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import blosc2 >>> import numpy as np >>> data = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 0]]) >>> # Convert the NumPy array to a Blosc2 NDArray >>> ndarray = blosc2.asarray(data) >>> print("NDArray data:", ndarray[:]) NDArray data: [[1 0 0] [0 1 0] [0 0 0]] >>> any_along_axis_0 = blosc2.any(ndarray, axis=0) >>> print("Any along axis 0:", any_along_axis_0) Any along axis 0: [True True False] >>> any_flattened = blosc2.any(ndarray) >>> print("Any in the flattened array:", any_flattened) Any in the flattened array: True
- abstract compute(item: slice | list[slice] | None = None, **kwargs: Any) NDArray [source]¶
Return a NDArray containing the evaluation of the LazyArray.
- Parameters:
item¶ (slice, list of slices, optional) – If provided, item is used to slice the operands prior to computation; not to retrieve specified slices of the evaluated result. This difference between slicing operands and slicing the final expression is important when reductions or a where clause are used in the expression.
kwargs¶ (Any, optional) – Keyword arguments that are supported by the
empty()
constructor. These arguments will be set in the resulting NDArray.
- Returns:
out – A NDArray containing the result of evaluating the LazyUDF or LazyExpr.
- Return type:
Notes
If self is a LazyArray from an udf, the kwargs used to store the resulting array will be the ones passed to the constructor in
lazyudf()
(except the urlpath) updated with the kwargs passed when calling this method.
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [3, 3] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> # Convert numpy arrays to Blosc2 arrays >>> a1 = blosc2.asarray(a) >>> b1 = blosc2.asarray(b) >>> # Perform the mathematical operation >>> expr = a1 + b1 >>> output = expr.compute() >>> f"Result of a + b (lazy evaluation): {output[:]}" Result of a + b (lazy evaluation): [[ 0. 1.25 2.5 ] [ 3.75 5. 6.25] [ 7.5 8.75 10. ]]
- abstract indices(order: str | list[str] | None = None) LazyArray [source]¶
Return an LazyArray containing the indices where self is True.
The LazyArray must be of bool dtype (e.g. a condition).
- Parameters:
order¶ (str, list of str, optional) – Specifies which fields to compare first, second, etc. A single field can be specified as a string. Not all fields need to be specified, only the ones by which the array is to be sorted.
- Returns:
out – The indices of the LazyArray self that are True.
- Return type:
- item() float | bool | complex | int [source]¶
Copy an element of an array to a standard Python scalar and return it.
- max(axis=None, keepdims=False, **kwargs)[source]¶
Return the maximum along a given axis.
The parameters are documented in the
min
.- Returns:
max_along_axis – The maximum of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import blosc2 >>> import numpy as np >>> data = np.array([[11, 2, 36, 24, 5, 69], [73, 81, 49, 6, 73, 0]]) >>> ndarray = blosc2.asarray(data) >>> print("NDArray data:", ndarray[:]) NDArray data: [[11 2 36 24 5 69] [73 81 49 6 73 0]] >>> # Compute the maximum along axis 0 and 1 >>> max_along_axis_0 = blosc2.max(ndarray, axis=0) >>> print("Maximum along axis 0:", max_along_axis_0) Maximum along axis 0: [73 81 49 24 73 69] >>> max_along_axis_1 = blosc2.max(ndarray, axis=1) >>> print("Maximum along axis 1:", max_along_axis_1) Maximum along axis 1: [69 81] >>> max_flattened = blosc2.max(ndarray) >>> print("Maximum of the flattened array:", max_flattened) Maximum of the flattened array: 81
- mean(axis=None, dtype=None, keepdims=False, **kwargs)[source]¶
Return the arithmetic mean along the specified axis.
The parameters are documented in the
sum
.- Returns:
mean_along_axis – The mean of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Example array >>> array = np.array([[1, 2, 3], [4, 5, 6]] >>> nd_array = blosc2.asarray(array) >>> # Compute the mean of all elements in the array (axis=None) >>> overall_mean = blosc2.mean(nd_array) >>> print("Mean of all elements:", overall_mean) Mean of all elements: 3.5
- min(axis=None, keepdims=False, **kwargs)[source]¶
Return the minimum along a given axis.
- Parameters:
ndarr¶ (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.
axis¶ (int or tuple of ints, optional) – Axis or axes along which to operate. By default, flattened input is used.
keepdims¶ (bool, optional) – If set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
kwargs¶ (dict, optional) – Keyword arguments that are supported by the
empty()
constructor.
- Returns:
min_along_axis – The minimum of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> array = np.array([1, 3, 7, 8, 9, 31]) >>> nd_array = blosc2.asarray(array) >>> min_all = blosc2.min(nd_array) >>> print("Minimum of all elements in the array:", min_all) Minimum of all elements in the array: 1 >>> # Compute the minimum along axis 0 with keepdims=True >>> min_keepdims = blosc2.min(nd_array, axis=0, keepdims=True) >>> print("Minimum along axis 0 with keepdims=True:", min_keepdims) Minimum along axis 0 with keepdims=True: [1]
- prod(axis=None, dtype=None, keepdims=False, **kwargs)[source]¶
Return the product of array elements over a given axis.
The parameters are documented in the
sum
.- Returns:
product_along_axis – The product of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Create an instance of NDArray with some data >>> array = np.array([[11, 22, 33], [4, 15, 36]]) >>> nd_array = blosc2.asarray(array) >>> # Compute the product of all elements in the array >>> prod_all = blosc2.prod(nd_array) >>> print("Product of all elements in the array:", prod_all) Product of all elements in the array: 17249760 >>> # Compute the product along axis 1 (rows) >>> prod_axis1 = blosc2.prod(nd_array, axis=1) >>> print("Product along axis 1:", prod_axis1) Product along axis 1: [7986 2160]
- abstract save(**kwargs: Any) None [source]¶
Save the LazyArray on disk.
- Parameters:
kwargs¶ (Any, optional) – Keyword arguments that are supported by the
empty()
constructor. The urlpath must always be provided.- Returns:
out
- Return type:
None
Notes
All the operands of the LazyArray must be Python scalars, or blosc2.Array objects.
If an operand is a Proxy, keep in mind that Python-Blosc2 will only be able to reopen it as such if its source is a SChunk, NDArray or a C2Array (see
blosc2.open()
notes section for more info).This is currently only supported for LazyExpr.
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [3, 3] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> # Define file paths for storing the arrays >>> a1 = blosc2.asarray(a, urlpath='a_array.b2nd', mode='w') >>> b1 = blosc2.asarray(b, urlpath='b_array.b2nd', mode='w') >>> # Perform the mathematical operation to create a LazyExpr expression >>> expr = a1 + b1 >>> # Save the LazyExpr to disk >>> expr.save(urlpath='lazy_array.b2nd', mode='w') >>> # Open and load the LazyExpr from disk >>> disk_expr = blosc2.open('lazy_array.b2nd') >>> disk_expr[:2] [[0. 1.25 2.5 ] [3.75 5. 6.25]]
- abstract sort(order: str | list[str] | None = None) LazyArray [source]¶
Return a sorted LazyArray.
This is only valid for LazyArrays with structured dtypes.
- std(axis=None, dtype=None, ddof=0, keepdims=False, **kwargs)[source]¶
Return the standard deviation along the specified axis.
- Parameters:
ndarr¶ (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.
axis¶ (int or tuple of ints, optional) – Axis or axes along which the standard deviation is computed. By default, axis=None computes the standard deviation of the flattened array.
dtype¶ (np.dtype or list str, optional) – Type to use in computing the standard deviation. For integer inputs, the default is float32; for floating point inputs, it is the same as the input dtype.
ddof¶ (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default, ddof is zero.
keepdims¶ (bool, optional) – If set to True, the reduced axes are left in the result as dimensions with size one. This ensures that the result will broadcast correctly against the input array.
kwargs¶ (dict, optional) – Additional keyword arguments that are supported by the
empty()
constructor.
- Returns:
std_along_axis – The standard deviation of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Create an instance of NDArray with some data >>> array = np.array([[1, 2, 3], [4, 5, 6]]) >>> nd_array = blosc2.asarray(array) >>> # Compute the standard deviation of the entire array >>> std_all = blosc2.std(nd_array) >>> print("Standard deviation of the entire array:", std_all) Standard deviation of the entire array: 1.707825127659933 >>> # Compute the standard deviation along axis 0 (columns) >>> std_axis0 = blosc2.std(nd_array, axis=0) >>> print("Standard deviation along axis 0:", std_axis0) Standard deviation along axis 0: [1.5 1.5 1.5]
- sum(axis=None, dtype=None, keepdims=False, **kwargs)[source]¶
Return the sum of array elements over a given axis.
- Parameters:
ndarr¶ (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.
axis¶ (int or tuple of ints, optional) – Axis or axes along which a sum is performed. By default, axis=None, sums all the elements of the input array. If axis is negative, it counts from the last to the first axis.
dtype¶ (np.dtype or list str, optional) – The type of the returned array and of the accumulator in which the elements are summed. The dtype of
ndarr
is used by default unless it has an integer dtype of less precision than the default platform integer.keepdims¶ (bool, optional) – If set to True, the reduced axes are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
kwargs¶ (dict, optional) – Additional keyword arguments supported by the
empty()
constructor.
- Returns:
sum_along_axis – The sum of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Example array >>> array = np.array([[1, 2, 3], [4, 5, 6]]) >>> nd_array = blosc2.asarray(array) >>> # Sum all elements in the array (axis=None) >>> total_sum = blosc2.sum(nd_array) >>> print("Sum of all elements:", total_sum) 21 >>> # Sum along axis 0 (columns) >>> sum_axis_0 = blosc2.sum(nd_array, axis=0) >>> print("Sum along axis 0 (columns):", sum_axis_0) Sum along axis 0 (columns): [5 7 9]
- to_cframe() bytes [source]¶
Compute LazyArray and convert to cframe.
- Returns:
out – The buffer containing the serialized NDArray instance.
- Return type:
bytes
- to_device(device: str)[source]¶
Copy the array from the device on which it currently resides to the specified device.
- var(axis=None, dtype=None, ddof=0, keepdims=False, **kwargs)[source]¶
Return the variance along the specified axis.
The parameters are documented in the
std
.- Returns:
var_along_axis – The variance of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Create an instance of NDArray with some data >>> array = np.array([[1, 2, 3], [4, 5, 6]]) >>> nd_array = blosc2.asarray(array) >>> # Compute the variance of the entire array >>> var_all = blosc2.var(nd_array) >>> print("Variance of the entire array:", var_all) Variance of the entire array: 2.9166666666666665 >>> # Compute the variance along axis 0 (columns) >>> var_axis0 = blosc2.var(nd_array, axis=0) >>> print("Variance along axis 0:", var_axis0) Variance along axis 0: [2.25 2.25 2.25]
- where(value1=None, value2=None)[source]¶
Select
value1
orvalue2
values based onTrue
/False
forself
.
- property device¶
Hardware device the array data resides on. Always equal to ‘cpu’.
- abstract property dtype: dtype¶
Get the data type of the Operand.
- Returns:
out – The data type of the Operand.
- Return type:
np.dtype
- abstract property info: InfoReporter¶
Get information about the Operand.
- Returns:
out – A printable class with information about the Operand.
- Return type:
InfoReporter
- abstract property ndim: int¶
Get the number of dimensions of the Operand.
- Returns:
out – The number of dimensions of the Operand.
- Return type:
int
- abstract property shape: tuple[int]¶
Get the shape of the Operand.
- Returns:
out – The shape of the Operand.
- Return type:
tuple
LazyExpr¶
An expression like a + sum(b)
, where there is at least one NDArray object in operands a
and b
, returns a LazyExpr object. You can also get a LazyExpr object using the lazyexpr
constructor (see below).
This object follows the LazyArray API for computation and storage.
- blosc2.lazyexpr(expression: str | bytes | LazyArray | NDArray, operands: dict | None = None, out: Array = None, where: tuple | list | None = None, local_dict: dict | None = None, global_dict: dict | None = None, ne_args: dict | None = None, _frame_depth: int = 2) LazyExpr [source]¶
Get a LazyExpr from an expression.
- Parameters:
expression¶ (str or bytes or LazyExpr or NDArray) – The expression to evaluate. This can be any valid expression that numexpr can ingest. If a LazyExpr is passed, the expression will be updated with the new operands.
operands¶ (dict[blosc2.Array], optional) – The dictionary with operands. Supported values are Python scalars, or any instance that is blosc2.Array compliant. If None, the operands will be seeked in the local and global dictionaries.
out¶ (blosc2.Array, optional) – The output array where the result will be stored. If not provided, a new NumPy array will be created and returned.
where¶ (tuple, list, optional) – A sequence of arguments for the where clause in the expression.
local_dict¶ (dict, optional) – The local dictionary to use when looking for operands in the expression. If not provided, the local dictionary of the caller will be used.
global_dict¶ (dict, optional) – The global dictionary to use when looking for operands in the expression. If not provided, the global dictionary of the caller will be used.
ne_args¶ (dict, optional) – Additional arguments to be passed to numexpr.evaluate() function.
_frame_depth¶ (int, optional) – The depth of the frame to use when looking for operands in the expression. The default value is 2.
- Returns:
out – A LazyExpr is returned.
- Return type:
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [3, 3] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> a1 = blosc2.asarray(a) >>> a1[:] [[0. 0.625 1.25 ] [1.875 2.5 3.125] [3.75 4.375 5. ]] >>> b1 = blosc2.asarray(b) >>> expr = 'a * b + 2' >>> operands = { 'a': a1, 'b': b1 } >>> lazy_expr = blosc2.lazyexpr(expr, operands=operands) >>> f"Lazy expression created: {lazy_expr}" Lazy expression created: a * b + 2 >>> lazy_expr[:] [[ 2. 2.390625 3.5625 ] [ 5.515625 8.25 11.765625] [16.0625 21.140625 27. ]]
LazyUDF¶
For getting a LazyUDF object (which is LazyArray-compliant) from a user-defined Python function, you can use the lazyudf constructor below. See a tutorial on how this works.
This object follows the LazyArray API for computation, although storage is not supported yet.
- blosc2.lazyudf(func: Callable[[tuple, np.ndarray, tuple[int]], None], inputs: Sequence[Any] | None, dtype: np.dtype, shape: tuple | list | None = None, chunked_eval: bool = True, **kwargs: Any) LazyUDF [source]¶
Get a LazyUDF from a python user-defined function.
- Parameters:
func¶ (Python function) – The user-defined function to apply to each block. This function will always receive the following parameters: - inputs_tuple: A tuple containing the corresponding slice for the block of each input in
inputs
. - output: The buffer to be filled as a multidimensional numpy.ndarray. - offset: The multidimensional offset corresponding to the start of the block being computed.inputs¶ (Sequence[Any] or None) – The sequence of inputs. Besides objects compliant with the blosc2.Array protocol, any other object is supported too, and it will be passed as-is to the user-defined function. If not needed, this can be empty, but shape must be provided.
dtype¶ (np.dtype) – The resulting ndarray dtype in NumPy format.
shape¶ (tuple, optional) – The shape of the resulting array. If None, the shape will be guessed from inputs.
chunked_eval¶ (bool, optional) – Whether to evaluate the function in chunks or not (blocks).
kwargs¶ (Any, optional) – Keyword arguments that are supported by the
empty()
constructor. These arguments will be used by theLazyArray.__getitem__()
andLazyArray.compute()
methods. The last one will ignore the urlpath parameter passed in this function.
- Returns:
out – A LazyUDF is returned.
- Return type:
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [3, 3] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(10, 20, num=size, dtype=dtype).reshape(shape) >>> a1 = blosc2.asarray(a) >>> b1 = blosc2.asarray(b) >>> # Define a user-defined function that will be applied to each block of data >>> def my_function(inputs_tuple, output, offset): >>> a, b = inputs_tuple >>> output[:] = a + b >>> # Create a LazyUDF object using the user-defined function >>> lazy_udf = blosc2.lazyudf(my_function, [a1, b1], dtype) >>> type(lazy_udf) <class 'blosc2.lazyexpr.LazyUDF'> >>> f"Result of LazyUDF evaluation: {lazy_udf[:]}" Result of LazyUDF evaluation: [[10. 12.5 15. ] [17.5 20. 22.5] [25. 27.5 30. ]]