LazyArray¶
This is an API interface for computing an expression or a Python user defined function.
You can get an object following the LazyArray API with any of the following ways:
Any expression that involves one or more NDArray objects. e.g.
a + b
, wherea
andb
are NDArray objects (see this tutorial).Using the
lazyexpr
constructor.Using the
lazyudf
constructor (see a tutorial).
The LazyArray object is a thin wrapper around the expression or user-defined function that allows for lazy computation. This means that the expression is not computed until the compute
or __getitem__
methods are called. The compute
method will return a new NDArray object with the result of the expression evaluation. The __getitem__
method will return an NumPy object instead.
See the LazyExpr and LazyUDF sections for more information.
- class blosc2.LazyArray[source]¶
- Attributes:
Methods
compute
([item])Return an NDArray containing the evaluation of the LazyArray.
indices
([order])Return an LazyArray containing the indices where self is True.
save
(**kwargs)Save the LazyArray on disk.
sort
([order])Return a sorted LazyArray.
Compute LazyArray and convert to cframe.
- Special Methods:
__getitem__
(item)Return a NumPy.ndarray containing the evaluation of the LazyArray.
Methods¶
- abstract __getitem__(item: int | slice | Sequence[slice]) blosc2.NDArray [source]¶
Return a NumPy.ndarray containing the evaluation of the LazyArray.
- Parameters:
item¶ (int, slice or sequence of slices) – If provided, item is used to slice the operands prior to computation; not to retrieve specified slices of the evaluated result. This difference between slicing operands and slicing the final expression is important when reductions or a where clause are used in the expression.
- Returns:
out – An array with the data containing the evaluated slice.
- Return type:
np.ndarray
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [30, 4] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape) >>> # Convert numpy arrays to Blosc2 arrays >>> a1 = blosc2.asarray(a) >>> b1 = blosc2.asarray(b) >>> # Perform the mathematical operation >>> expr = a1 + b1 # LazyExpr expression >>> expr[3] [2.01680672 2.18487395 2.35294118 2.5210084 ] >>> expr[2:4] [[1.34453782 1.51260504 1.68067227 1.8487395 ] [2.01680672 2.18487395 2.35294118 2.5210084 ]]
- abstract compute(item: slice | list[slice] | None = None, **kwargs: Any) NDArray [source]¶
Return an NDArray containing the evaluation of the LazyArray.
- Parameters:
item¶ (slice, list of slices, optional) – If provided, item is used to slice the operands prior to computation; not to retrieve specified slices of the evaluated result. This difference between slicing operands and slicing the final expression is important when reductions or a where clause are used in the expression.
kwargs¶ (Any, optional) – Keyword arguments that are supported by the
empty()
constructor. These arguments will be set in the resulting NDArray.
- Returns:
out – A NDArray containing the result of evaluating the LazyUDF or LazyExpr.
- Return type:
Notes
If self is a LazyArray from an udf, the kwargs used to store the resulting array will be the ones passed to the constructor in
lazyudf()
(except the urlpath) updated with the kwargs passed when calling this method.
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [3, 3] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> # Convert numpy arrays to Blosc2 arrays >>> a1 = blosc2.asarray(a) >>> b1 = blosc2.asarray(b) >>> # Perform the mathematical operation >>> expr = a1 + b1 >>> output = expr.compute() >>> f"Result of a + b (lazy evaluation): {output[:]}" Result of a + b (lazy evaluation): [[ 0. 1.25 2.5 ] [ 3.75 5. 6.25] [ 7.5 8.75 10. ]]
- abstract indices(order: str | list[str] | None = None) LazyArray [source]¶
Return an LazyArray containing the indices where self is True.
The LazyArray must be of bool dtype (e.g. a condition).
- Parameters:
order¶ (str, list of str, optional) – Specifies which fields to compare first, second, etc. A single field can be specified as a string. Not all fields need to be specified, only the ones by which the array is to be sorted.
- Returns:
out – The indices of the LazyArray self that are True.
- Return type:
- abstract save(**kwargs: Any) None [source]¶
Save the LazyArray on disk.
- Parameters:
kwargs¶ (Any, optional) – Keyword arguments that are supported by the
empty()
constructor. The urlpath must always be provided.- Returns:
out
- Return type:
None
Notes
All the operands of the LazyArray must be Python scalars, NDArray, C2Array or Proxy.
If an operand is a Proxy, keep in mind that Python-Blosc2 will only be able to reopen it as such if its source is a SChunk, NDArray or a C2Array (see
blosc2.open()
notes section for more info).This is currently only supported for LazyExpr.
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [3, 3] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> # Define file paths for storing the arrays >>> a1 = blosc2.asarray(a, urlpath='a_array.b2nd', mode='w') >>> b1 = blosc2.asarray(b, urlpath='b_array.b2nd', mode='w') >>> # Perform the mathematical operation to create a LazyExpr expression >>> expr = a1 + b1 >>> # Save the LazyExpr to disk >>> expr.save(urlpath='lazy_array.b2nd', mode='w') >>> # Open and load the LazyExpr from disk >>> disk_expr = blosc2.open('lazy_array.b2nd') >>> disk_expr[:2] [[0. 1.25 2.5 ] [3.75 5. 6.25]]
- abstract sort(order: str | list[str] | None = None) LazyArray [source]¶
Return a sorted LazyArray.
This is only valid for LazyArrays with structured dtypes.
- to_cframe() bytes [source]¶
Compute LazyArray and convert to cframe.
- Returns:
out – The buffer containing the serialized NDArray instance.
- Return type:
bytes
- abstract property dtype: dtype¶
Get the data type of the LazyArray.
- Returns:
out – The data type of the LazyArray.
- Return type:
np.dtype
- abstract property info: InfoReporter¶
Get information about the LazyArray.
- Returns:
out – A printable class with information about the LazyArray.
- Return type:
InfoReporter
LazyExpr¶
An expression like a + sum(b)
, where there is at least one NDArray object in operands a
and b
, returns a LazyExpr object. You can also get a LazyExpr object using the lazyexpr
constructor (see below).
This object follows the LazyArray API for computation and storage.
- blosc2.lazyexpr(expression: str | bytes | LazyExpr | NDArray, operands: dict | None = None, out: NDArray | ndarray = None, where: tuple | list | None = None, local_dict: dict | None = None, global_dict: dict | None = None, ne_args: dict | None = None, _frame_depth: int = 2) LazyExpr [source]¶
Get a LazyExpr from an expression.
- Parameters:
expression¶ (str or bytes or LazyExpr) – The expression to evaluate. This can be any valid expression that can be ingested by numexpr. If a LazyExpr is passed, the expression will be updated with the new operands.
operands¶ (dict) – The dictionary with operands. Supported values are NumPy.ndarray, Python scalars, NDArray, NDField or C2Array instances. If None, the operands will be seeked in the local and global dictionaries.
out¶ (NDArray or np.ndarray, optional) – The output array where the result will be stored. If not provided, a new NumPy array will be created and returned.
where¶ (tuple, list, optional) – A sequence of arguments for the where clause in the expression.
local_dict¶ (dict, optional) – The local dictionary to use when looking for operands in the expression. If not provided, the local dictionary of the caller will be used.
global_dict¶ (dict, optional) – The global dictionary to use when looking for operands in the expression. If not provided, the global dictionary of the caller will be used.
ne_args¶ (dict, optional) – Additional arguments to be passed to numexpr.evaluate() function.
_frame_depth¶ (int, optional) – The depth of the frame to use when looking for operands in the expression. The default value is 2.
- Returns:
out – A LazyExpr is returned.
- Return type:
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [3, 3] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(0, 5, num=size, dtype=dtype).reshape(shape) >>> a1 = blosc2.asarray(a) >>> a1[:] [[0. 0.625 1.25 ] [1.875 2.5 3.125] [3.75 4.375 5. ]] >>> b1 = blosc2.asarray(b) >>> expr = 'a * b + 2' >>> operands = { 'a': a1, 'b': b1 } >>> lazy_expr = blosc2.lazyexpr(expr, operands=operands) >>> f"Lazy expression created: {lazy_expr}" Lazy expression created: a * b + 2 >>> lazy_expr[:] [[ 2. 2.390625 3.5625 ] [ 5.515625 8.25 11.765625] [16.0625 21.140625 27. ]]
LazyUDF¶
For getting a LazyUDF object (which is LazyArray-compliant) from a user-defined Python function, you can use the lazyudf constructor below. See a tutorial on how this works.
This object follows the LazyArray API for computation, although storage is not supported yet.
- blosc2.lazyudf(func: Callable[[tuple, np.ndarray, tuple[int]], None], inputs: tuple | list | None, dtype: np.dtype, shape: tuple | list | None = None, chunked_eval: bool = True, **kwargs: Any) LazyUDF [source]¶
Get a LazyUDF from a python user-defined function.
- Parameters:
func¶ (Python function) – The user-defined function to apply to each block. This function will always receive the following parameters: - inputs_tuple: A tuple containing the corresponding slice for the block of each input in
inputs
. - output: The buffer to be filled as a multidimensional numpy.ndarray. - offset: The multidimensional offset corresponding to the start of the block being computed.inputs¶ (tuple or list or None) – The sequence of inputs. Supported inputs are: NumPy.ndarray, NDArray, NDField, C2Array. Any other object is supported too, and will be passed as is to the user-defined function. If not needed, this can be empty, but shape must be provided.
dtype¶ (np.dtype) – The resulting ndarray dtype in NumPy format.
shape¶ (tuple, optional) – The shape of the resulting array. If None, the shape will be guessed from inputs.
chunked_eval¶ (bool, optional) – Whether to evaluate the function in chunks or not (blocks).
kwargs¶ (Any, optional) – Keyword arguments that are supported by the
empty()
constructor. These arguments will be used by theLazyArray.__getitem__()
andLazyArray.compute()
methods. The last one will ignore the urlpath parameter passed in this function.
- Returns:
out – A LazyUDF is returned.
- Return type:
Examples
>>> import blosc2 >>> import numpy as np >>> dtype = np.float64 >>> shape = [3, 3] >>> size = shape[0] * shape[1] >>> a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape) >>> b = np.linspace(10, 20, num=size, dtype=dtype).reshape(shape) >>> a1 = blosc2.asarray(a) >>> b1 = blosc2.asarray(b) >>> # Define a user-defined function that will be applied to each block of data >>> def my_function(inputs_tuple, output, offset): >>> a, b = inputs_tuple >>> output[:] = a + b >>> # Create a LazyUDF object using the user-defined function >>> lazy_udf = blosc2.lazyudf(my_function, [a1, b1], dtype) >>> type(lazy_udf) <class 'blosc2.lazyexpr.LazyUDF'> >>> f"Result of LazyUDF evaluation: {lazy_udf[:]}" Result of LazyUDF evaluation: [[10. 12.5 15. ] [17.5 20. 22.5] [25. 27.5 30. ]]