User Defined Functions¶

Of course, one may want to do computations which are more complex than those considered in the last tutorial (so complex that they do not fit in a single line/expression). To this end, we’ll see how one can define a function and make it act like a lazy expression when it comes to computations with NDArray and/or NumPy arrays, using the Lazy User Defined Function LazyUDF object.

[1]:

import time

import numba as nb
import numpy as np

import blosc2

A simple example¶

First, let’s create a NDArray array, a NumPy array and regular scalar, which will be the operands of our function.

[2]:

shape = (5_000, 2_000)
a = np.linspace(0, 1, np.prod(shape), dtype=np.int32).reshape(shape)
b = blosc2.arange(np.prod(shape), dtype=np.float32, shape=shape)
s = 2.1  # a regular scalar

Now, let’s define our function, which will be the executable attribute of a LazyUDF object. Internally, LazyUDF will execute the function chunkwise on the operands when requested, and will expect the function to have a signature with three parameters: 1) an inputs tuple; 2) an output buffer to be filled; and 3) the chunk offset coordinates. When the function is called by LazyUDF, the inputs tuple will contain chunks of the operands, and must fill the output buffer with the computation result (which is automatically of the correct shape and dtype due to the internal mechanics of LazyUDF). The offset is the coordinates of the chunk being filled in the output, which is often useful (but not always necessary). For example, if we were to write a function to fill an empty array with ones on the main diagonal chunk-by-chunk, some chunks may have all zeros, which one will be able to ascertain using the coordinates in the offset parameter (see the implementation of `blosc2.eye <../../reference/ndarray.html#blosc2.eye>`__).

For the moment, we’ll just write a function that does something simple with the operands and writes the result to the buffer.

[3]:

def myudf(inputs_tuple, output, offset):
    x, y, s = inputs_tuple  # at this point, all are either numpy arrays or scalars
    output[:] = x**3 + np.sin(y) + s + 1

It is important to write the result to the memory location indicated by the buffer using output[:] = result, since writing output = result would merely overwrite the value of output, which is just a memory address, and leave the memory at the address untouched.

Now, to actually create a LazyUDF object (which also follows the LazyArray interface) we will use its constructor lazyudf. As arguments, we provide: the UDF we have defined; a tuple with the operands; and the dtype of the output. The latter is important since it will be used to create the output buffer. Optionally we can provide the shape of the output, but if not the shape will be inferred from the operands.

[4]:

larray = blosc2.lazyudf(myudf, (a, b, s), a.dtype)
print(f"Type: {type(larray)}")

Type: <class 'blosc2.lazyexpr.LazyUDF'>

Since the LazyUDF object implements the same LazyArray interface as LazyExpr, we may execute and get the result of the function via either of the __getitem__ (returning a NumPy array) and compute (returning a NDArray array) methods. Let’s see __getitem__ first, computing either a slice or the whole result:

[5]:

npc = larray[:10]  # compute a slice of the result
print(f"Slice - Type: {type(npc)}, shape: {npc.shape}")
npc = larray[:]  # compute the whole result
print(f"Full array - Type: {type(npc)}, shape: {npc.shape}")

Slice - Type: <class 'numpy.ndarray'>, shape: (10, 2000)
Full array - Type: <class 'numpy.ndarray'>, shape: (5000, 2000)

Now, let’s use compute for the same purpose. The advantage of using this method is that you can pass some construction parameters for the resulting NDArray like the urlpath to store the resulting array on-disk, as we saw in the previous tutorial.

[6]:

c = larray.compute(urlpath="larray.b2nd", mode="w")
print(f"Type: {type(c)}")
print(c.info)
blosc2.remove_urlpath("larray.b2nd")  # clean-up

Type: <class 'blosc2.ndarray.NDArray'>
type    : NDArray
shape   : (5000, 2000)
chunks  : (625, 2000)
blocks  : (40, 2000)
dtype   : int32
cratio  : 293.26
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
        : nthreads=10, blocksize=320000, splitmode=<SplitMode.AUTO_SPLIT: 3>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=10)

BONUS: Using Numba¶

Numba is a Just-In-Time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. It is particularly useful for numerical computations and can significantly speed up the execution of functions that are computationally intensive. Python-Blosc2 can also interface with Numba, via UDFs. It’s as simple as decorating the same function as before with a Numba jit decorator.

[7]:

@nb.jit(nopython=True, parallel=True)
def myudf_numba(inputs_tuple, output, offset):
    x, y, s = inputs_tuple
    output[:] = x**3 + np.sin(y) + s + 1


larray_nb = blosc2.lazyudf(myudf_numba, (a, b, s), a.dtype)

We then use the lazyudf constructor as before. Cool! Now, let’s evaluate it and compare timings with the pure Python version.

[8]:

t1 = time.time()
npc_nb = larray_nb[:]  # numba version
t_nb = time.time() - t1

t1 = time.time()
npc = larray[:]  # pure python version
t_ = time.time() - t1
print(f"Numba: {t_nb:.3f} seconds, pure Python: {t_:.3f} seconds")

Numba: 4.825 seconds, pure Python: 0.280 seconds

Incidentally, the pure Python version was faster than Numba. This is because Numba has large initialization overheads and the function is quite simple. For more complex functions, or larger arrays, the difference will be less noticeable or indeed favorable to Numba. As an exercise, check at which array size the Numba UDF starts to be competitive. If you’re a Numba pro, you may also want to unroll loops within the UDF and see whether you can make it faster.

Summary¶

We have seen how to build new LazyUDFobjects based on bespoke User Defined Functions (UDFs) to perform computations of arbitrary complexity lazily. We have also demonstrated that integrating Numba in UDF is pretty easy.