LazyArray UDF DSL Kernels

@blosc2.dsl_kernel lets you write kernels with Python function syntax while executing through the miniexpr DSL path.

Use DSL kernels when you want:

  • A vectorized UDF model (operate over NDArray chunks/blocks, not Python scalar loops)

  • Optional JIT compilation via miniexpr backends (for example tcc/cc) without requiring Numba

  • Early syntax validation and actionable diagnostics for unsupported constructs

This tutorial complements 03.lazyarray-udf.ipynb (generic Python UDFs).

For the canonical DSL syntax contract, see the DSL syntax reference.

Choosing the Right Interface

Goal

Recommended API

Elementwise formulas using built-in functions/operators

blosc2.lazyexpr(...)

Arbitrary Python logic over blocks/chunks

blosc2.lazyudf(...)

DSL subset with early syntax checks and optional miniexpr JIT

@blosc2.dsl_kernel + blosc2.lazyudf(...)

[1]:
import numpy as np

import blosc2

1. Define a DSL Kernel

A valid DSL kernel can be used with blosc2.lazyudf(...) like a regular UDF.

[2]:
@blosc2.dsl_kernel
def kernel_index_ramp(x):
    # _i* and _n* are reserved DSL index/shape symbols, so disable linter warnings
    return x + _i0 * _n1 + _i1  # noqa: F821
[3]:
shape = (5, 10)
x = blosc2.ones(shape, dtype=np.float32)
expr = blosc2.lazyudf(kernel_index_ramp, (x,), dtype=np.float32)
res = expr[:]
res
[3]:
array([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
       [11., 12., 13., 14., 15., 16., 17., 18., 19., 20.],
       [21., 22., 23., 24., 25., 26., 27., 28., 29., 30.],
       [31., 32., 33., 34., 35., 36., 37., 38., 39., 40.],
       [41., 42., 43., 44., 45., 46., 47., 48., 49., 50.]], dtype=float32)
[4]:
# Optional: request miniexpr JIT backend for this DSL kernel
try:
    expr_jit = blosc2.lazyudf(
        kernel_index_ramp,
        (x,),
        dtype=x.dtype,
        jit=True,
        jit_backend="tcc",
    )
    res_jit = expr_jit.compute()
    res_jit[:2, :5]
except Exception as e:
    print(f"JIT backend unavailable in this environment: {e}")

1.a Zero-Parameter DSL Kernel

Kernels with no parameters are also valid. When inputs is empty, you must pass an explicit output shape to lazyudf(...).

[5]:
@blosc2.dsl_kernel
def kernel_no_inputs():
    return _i0 + 10 * _i1  # noqa: F821


expr0 = blosc2.lazyudf(kernel_no_inputs, (), dtype=np.int32, shape=(3, 4))
res0 = expr0[:]
res0
[5]:
array([[ 0, 10, 20, 30],
       [ 1, 11, 21, 31],
       [ 2, 12, 22, 32]], dtype=int32)

1.b DSL Kernel with Multiple Parameters

Kernels with more than one parameter work the same way; all inputs are passed through lazyudf(...) in a tuple.

[6]:
@blosc2.dsl_kernel
def kernel_weighted_mix(x, y, b):
    return 0.25 * x + 2.0 * y + b


xw = blosc2.asarray(np.arange(12, dtype=np.float32).reshape(3, 4))
yw = blosc2.ones((3, 4), dtype=np.float32)
bw = 32.4
resw = blosc2.lazyudf(kernel_weighted_mix, (xw, yw, bw), dtype=np.float32)[:]
resw[:2, :3]
[6]:
array([[34.4 , 34.65, 34.9 ],
       [35.4 , 35.65, 35.9 ]], dtype=float32)

2. Preflight Validation (validate_dsl)

You can validate a kernel and inspect diagnostics without executing it.

Common Diagnostics Cheat Sheet

  • Ternary expression (a if cond else b) is unsupported: use where(cond, a, b).

  • Reserved names (int, float, bool, print, _ndim, _i*, _n*) cannot be reused.

  • Missing return on an executed path can fail at runtime, even if compilation succeeds.

[7]:
report_ok = blosc2.validate_dsl(kernel_index_ramp)
report_ok
[7]:
{'valid': True,
 'dsl_source': 'def kernel_index_ramp(x):\n    # _i* and _n* are reserved DSL index/shape symbols, so disable linter warnings\n    return x + _i0 * _n1 + _i1  # noqa: F821',
 'input_names': ['x'],
 'error': None}

3. Invalid Syntax Examples

validate_dsl helps catch unsupported constructs early, before running lazyudf(...).

3.a Ternary Expressions Are Not Supported

[8]:
@blosc2.dsl_kernel
def kernel_invalid_ternary(x):
    return 1 if x else 0
[9]:
report_bad_ternary = blosc2.validate_dsl(kernel_invalid_ternary)
print(report_bad_ternary["valid"])
print(report_bad_ternary["error"])
False
Ternary expressions are not supported in DSL; use where(cond, a, b) at line 2, column 14

DSL kernel source:
1 | def kernel_invalid_ternary(x):
2 |     return 1 if x else 0
  |              ^

See: https://github.com/Blosc/miniexpr/blob/main/doc/dsl-usage.md

3.b Reserved Names Cannot Be Reused

[15]:
@blosc2.dsl_kernel
def kernel_invalid_reserved_name(x):
    int = x + 1
    return int + 2
[11]:
report_bad_reserved = blosc2.validate_dsl(kernel_invalid_reserved_name)
print(report_bad_reserved["valid"])
print(report_bad_reserved["error"])
True
None

4. Control Flow and Casts

The DSL supports if/else blocks and cast intrinsics such as float(...).

[12]:
@blosc2.dsl_kernel
def kernel_clip_and_scale(x):
    if x < 0:
        y = 0
    else:
        y = x
    return float(y) * 0.5


x2_np = np.linspace(-2.0, 2.0, num=10, dtype=np.float32).reshape(2, 5)
x2 = blosc2.asarray(x2_np)
res2 = blosc2.lazyudf(kernel_clip_and_scale, (x2,), dtype=np.float32)[:]
res2
[12]:
array([[0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.11111111, 0.33333334, 0.5555556 , 0.7777778 , 1.        ]],
      dtype=float32)

5. Loops and Reserved ND Symbols

You can use for ... in range(...) together with reserved symbols like _i0, _i1, _n0, _n1 and _flat_idx.

[13]:
@blosc2.dsl_kernel
def kernel_add_triangular_col_index(x):
    acc = 0
    for j in range(_i1 + 1):  # noqa: F821
        acc += j
    return x + acc


x3 = blosc2.zeros((2, 5), dtype=np.float32)
res3 = blosc2.lazyudf(kernel_add_triangular_col_index, (x3,), dtype=np.float32)[:]
res3
[13]:
array([[ 0.,  1.,  3.,  6., 10.],
       [ 0.,  1.,  3.,  6., 10.]], dtype=float32)
[14]:
expected = np.array([0, 1, 3, 6, 10], dtype=np.float32)
np.allclose(res3[0], expected), res3[0]
[14]:
(True, array([ 0.,  1.,  3.,  6., 10.], dtype=float32))

6. Advanced Examples

For more advanced real-world DSL kernels, see:

  • examples/ndarray/mandelbrot-dsl.ipynb

  • examples/ndarray/black-scholes_hist-dsl.ipynb

GitHub links: