User Defined Functions¶
Of course, one may want to do computations which are more complex than those considered in the last tutorial (so that they do not fit in a single expression). To this end, we’ll see how one can define a function and make it act like a lazy expression when it comes to computations with NDArray and/or NumPy arrays, using the Lazy User Defined Function LazyUDF
object.
[1]:
import time
import numba as nb
import numpy as np
import blosc2
A simple example¶
First, let’s create a NDArray, a NumPy array and regular scalar, which we will be the operands of our function.
[2]:
shape = (5_000, 2_000)
a = np.linspace(0, 1, np.prod(shape), dtype=np.int32).reshape(shape)
b = blosc2.arange(np.prod(shape), dtype=np.float32, shape=shape)
s = 2.1 # a regular scalar
Now, let’s define our function, which will be the executable attribute of a LazyUDF
object. Internally, LazyUDF
will execute the function chunkwise on the operands when requested, and will expect the function to have a signature with three parameters: 1) an inputs tuple; 2) an output buffer to be filled; and 3) the chunk offset coordinates. When called by LazyUDF
, the inputs tuple will contain chunks of the operands, and must fill the output buffer with the computation result (which
is automatically of the correct shape and dtype due to the internal mehcnics of LazyUDF
). The offset is the coordinates of the chunk being filled in the output, which is often useful (but not always necessary). For example, if we were to write a function to fill an empty array with ones on the main diagonal chunk-by-chunk, some chunks may have all zeros, which one will be able to ascertain using the coordinates in the offset parameter (see the implementation of
`blosc2.eye
<../../reference/ndarray.html#blosc2.eye>`__).
For the moment, we’ll just write a function that does something simple with the operands and writes the result to the buffer.
[3]:
def myudf(inputs_tuple, output, offset):
x, y, s = inputs_tuple # at this point, all are either numpy arrays or scalars
output[:] = x**3 + np.sin(y) + s + 1
It is important to write the result to the memory location indicated by the buffer using output[:] = result
, since writing output = result
just overwrites the value of output
, which is just a memory address, and leaves the memory at the address untouched.
Now, to actually create a LazyUDF
object (which also follows the LazyArray interface) we will use its constructor lazyudf
, providing the UDF we have defined, a tuple with the operands, and the dtype of the output. The latter is important since it will be used to create the output buffer. Optionally we can provide the shape of the output, but if not the shape will be inferred from the operands.
[4]:
larray = blosc2.lazyudf(myudf, (a, b, s), a.dtype)
print(f"Type: {type(larray)}")
Type: <class 'blosc2.lazyexpr.LazyUDF'>
Since the LazyUDF
object implements the same LazyArray
interface as LazyExpr
, we may execute and get the result of the function via either of the __getitem__
(returning a NumPy array) and compute
(returning a NDArray array) methods. Let’s see __getitem__
first, computing either a slice or the whole result:
[5]:
npc = larray[:10] # compute a slice of the result
print(f"Slice - Type: {type(npc)}, shape: {npc.shape}")
npc = larray[:] # compute the whole result
print(f"Full array - Type: {type(npc)}, shape: {npc.shape}")
Slice - Type: <class 'numpy.ndarray'>, shape: (10, 2000)
Full array - Type: <class 'numpy.ndarray'>, shape: (5000, 2000)
Now, let’s use compute
for the same purpose. The advantage of using this method is that you can pass some construction parameters for the resulting NDArray like the urlpath
to store the resulting array on-disk, as we saw in the previous tutorial.
[6]:
c = larray.compute(urlpath="larray.b2nd", mode="w")
print(f"Type: {type(c)}")
print(c.info)
Type: <class 'blosc2.ndarray.NDArray'>
type : NDArray
shape : (5000, 2000)
chunks : (1000, 2000)
blocks : (10, 2000)
dtype : int32
cratio : 227.03
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
: nthreads=4, blocksize=80000, splitmode=<SplitMode.AUTO_SPLIT: 3>,
: filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
: <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
: 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=4)
BONUS: Using Numba¶
Numba is a Just-In-Time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. It is particularly useful for numerical computations and can significantly speed up the execution of functions that are computationally intensive. Python-Blosc2 can also interface with Numba, via UDFs. It’s as simple as decorating the same function as before with a Numba jit
decorator.
[7]:
@nb.jit(nopython=True, parallel=True)
def myudf_numba(inputs_tuple, output, offset):
x, y, s = inputs_tuple
output[:] = x**3 + np.sin(y) + s + 1
larray_nb = blosc2.lazyudf(myudf_numba, (a, b, s), a.dtype)
We then use the lazyudf
constructor as before. Cool! Now, let’s evaluate it and compare timings with the pure Python version.
[8]:
t1 = time.time()
npc_nb = larray_nb[:] # numba version
t_nb = time.time() - t1
t1 = time.time()
npc = larray[:] # pure python version
t_ = time.time() - t1
print(f"Numba: {t_nb:.3f} seconds, pure Python: {t_:.3f} seconds")
Numba: 5.509 seconds, pure Python: 0.136 seconds
Incidentally, the pure Python version was faster than Numba. This is because Numba has large initialization overheads and the function is quite simple. For more complex functions, or larger arrays, the difference will be less noticeable or favorable to it. As an exercise, check at which array size the Numba UDF starts to be competitive. If you’re a Numba pro, you may also want to unroll loops within the UDF and see whether you can make it faster.
Summary¶
We have seen how to build new LazyUDF
objects based on bespoke User Defined Functions (UDFs) to perform computations of arbitrary complexity lazily. We have also demonstrated that integrating Numba in UDF is pretty easy.