# LazyArray: Expressions containing NDArray objects (and others)#

Python-Blosc2 implements a powerful way to operate with NDArray (and other flavors) objects. In this section, we will see how to do computations with NDArray arrays in a simple way.

```
[1]:
```

```
import blosc2
import numpy as np
```

## A simple example#

First, let’s create a couple of NDArrays. We will use NumPy arrays to fill them.

```
[2]:
```

```
shape = (500, 1000)
npa = np.linspace(0, 1, np.prod(shape), dtype=np.float32).reshape(shape)
npb = np.linspace(1, 2, np.prod(shape), dtype=np.float64).reshape(shape)
a = blosc2.asarray(npa, urlpath="a.b2nd", mode="w")
b = blosc2.asarray(npb, urlpath="b.b2nd", mode="w")
```

Now, let’s create an expression that involves `a`

and `b`

```
[3]:
```

```
c = a**2 + b**2 + 2 * a * b + 1
print(c.info) # at this stage, the expression has not been evaluated yet
```

```
type : LazyExpr
expression : ((((o0 ** 2) + (o1 ** 2)) + ((2 * o0) * o1)) + 1)
operands : {'o0': 'a.b2nd', 'o1': 'b.b2nd'}
shape : (500, 1000)
dtype : float64
```

We see that the outcome of the expression is a `LazyExpr`

object. This object is a placeholder for the actual computation that will be done when we evaluate it. This is a very powerful feature because it allows us to build complex expressions without actually computing them until we really need the result.

Now, let’s evaluate it. `LazyExpr`

objects follow the `LazyArray`

interface, and this provides several ways for performing the evaluation, depending on the object we want as the desired output.

First, let’s use the `eval`

method. The result will be another NDArray array:

```
[4]:
```

```
d = c.eval() # evaluate the expression
print(f"Class: {type(d)}")
print(f"Compression ratio: {d.schunk.cratio:.2f}x")
```

```
Class: <class 'blosc2.ndarray.NDArray'>
Compression ratio: 1.89x
```

We can specify different compression parameters for the result. For example, we can change the codec to `zstd`

, use the bitshuffle filter, and the compression level set to 9:

```
[5]:
```

```
cparams = {
"codec": blosc2.Codec.ZSTD,
"filters": [blosc2.Filter.BITSHUFFLE],
"clevel": 9
}
d = c.eval(cparams=cparams)
print(f"Compression ratio: {d.schunk.cratio:.2f}x")
```

```
Compression ratio: 2.10x
```

Now, let’s evaluate the expression and store the result in a NumPy array. For this, we will use the `__getitem__`

method:

```
[6]:
```

```
npd = d[:]
print(f"Class: {type(npd)}")
```

```
Class: <class 'numpy.ndarray'>
```

## Saving expressions to disk#

Finally, you can save expressions to disk. For this, use the `save`

method of `LazyArray`

objects. For example, let’s save the expression `c`

to disk:

```
[7]:
```

```
c = a**2 + b**2 + 2 * a * b + 1
c.save(urlpath="expr.b2nd")
```

And you can load it back with the `open`

function:

```
[8]:
```

```
c2 = blosc2.open("expr.b2nd")
print(c2.info)
```

```
type : LazyExpr
expression : ((((o0 ** 2) + (o1 ** 2)) + ((2 * o0) * o1)) + 1)
operands : {'o0': 'a.b2nd', 'o1': 'b.b2nd'}
shape : (500, 1000)
dtype : float64
```

Now, you can evaluate it as before:

```
[9]:
```

```
d2 = c2.eval()
print(f"Compression ratio: {d2.schunk.cratio:.2f}x")
```

```
Compression ratio: 1.89x
```

## Reductions#

We can also perform reductions on NDArray arrays. Let’s see an example:

```
[10]:
```

```
c = (a + b).sum()
c
```

```
[10]:
```

```
999999.9999999471
```

As we can see, the result is a scalar. That means that reductions in expressions always perform the computation immediately. We can also specify the axis for the reduction:

```
[11]:
```

```
c = (a + b).sum(axis=1)
print(f"Shape of c: {c.shape}")
# Show the first 4 elements of the result
c[:4]
```

```
Shape of c: (500,)
```

```
[11]:
```

```
array([1001.998004 , 1005.998012 , 1009.99802 , 1013.99802799])
```

## Selections#

We can also perform selections on NDArray arrays with structured types. Let’s see an example. First, we will create a structured array:

```
[12]:
```

```
nps = np.array([(1, 2.0, b'Hello'), (2, 1.0, b'World'), (4, 3.9, b'World2')],
dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'S10')])
s = blosc2.asarray(nps, urlpath="s.b2nd", mode="w")
s[:]
```

```
[12]:
```

```
array([(1, 2. , b'Hello'), (2, 1. , b'World'), (4, 3.9, b'World2')],
dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])
```

Now, we can select rows depending on the value of different fields:

```
[13]:
```

```
A = s.fields['A']
B = s.fields['B']
expr = s[A > B]
expr[:]
```

```
[13]:
```

```
array([(2, 1. , b'World'), (4, 3.9, b'World2')],
dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])
```

We can do the same on a more compact way using a expression in string form:

```
[14]:
```

```
expr = s['A > B']
expr[:]
```

```
[14]:
```

```
array([(2, 1. , b'World'), (4, 3.9, b'World2')],
dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])
```

The expression can also be a complex one:

```
[15]:
```

```
C = s.fields['C']
expr = s[(A > B) & (C == b'World')]
expr[:]
```

```
[15]:
```

```
array([(2, 1., b'World')],
dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])
```

We can also do selections and extract a single field:

```
[16]:
```

```
C[A > B][:]
```

```
[16]:
```

```
array([b'World', b'World2'], dtype='|S10')
```

Finally, we can do selections and perform reductions on them in one go by using the `where()`

function. For example, let’s sum all the rows with the maximum of field `A`

or field `B`

:

```
[17]:
```

```
s[A > B].where(A, B).sum()
```

```
[17]:
```

```
8.0
```

Combining all the different weaponery of selections can make querying your data very effective. As the evaluation is lazy, all the operations are grouped and executed together for maximum performance; the only exception is that, when a reduction is found, it is evaluated eagerly, but still can be part of more general expressions.

## Broadcasting#

NumPy arrays support broadcasting, and so do NDArray arrays. Let’s see an example:

```
[18]:
```

```
b2 = b[0] # take the first row of b
print(f"Shape of a: {a.shape}, shape of b2: {b2.shape}")
```

```
Shape of a: (500, 1000), shape of b2: (1000,)
```

We see that the shapes of `a`

and `b2`

are different. However, we can still operate with them and the broadcasting will be done automatically (à la NumPy):

```
[19]:
```

```
c2 = a + b2
d2 = c2.eval()
print(f"Compression ratio: {d2.schunk.cratio:.2f}x, shape: {d2.shape}")
```

```
Compression ratio: 32.63x, shape: (500, 1000)
```

The boradcasting feature is still experimental, and it may not work in all cases. If you find a bug, please report it to the Python-Blosc2 issue tracker.

## Summary#

In this section, we have seen how to perform computations with NDArray arrays. We have seen how to create expressions, evaluate them, and save them to disk. We have also seen how to perform reductions, selections and combinations of both. Finally, we have seen how expressions containing operators having different (but compatible) shapes can be evaluated too. Lazy expressions are a very powerful feature that allows you to build and evaluate complex computations from operands that can be
in-memory, on-disk or in remote boxes (`C2Array`

) in a simple way, and very effectively (see the benchmarks).