Wrapping C-Blosc2 in Python (a beginner's view)

An initial release of the Python wrapper for C-Blosc2 is now available in: https://github.com/Blosc/python-blosc2. In this blog I will try to explain some of the most difficult aspects that I had to learn in doing this and how I solved them.

This work is being made thanks to a grant from the Python Software Foundation.

Python views

At university, the first programming language that I learned was Python. But because programming was new for the majority of the class the subject only covered the basics: basic statements and classes. And although these were easy to understand, the views were unknown to me (until now).

To explain what the views are, let’s suppose we have the following code in Python:

>>> import sys
>>> a = []
>>> b = a
>>> sys.getrefcount(a)

The reference count for the object is 3: a, b and the argument passed to sys.getrefcount().

Basically, to avoid making copies of a same variable, Python uses views. Every variable has its counter and until the counter is 0, the variable is not deleted. But that means that two threads cannot access the counter at the same time. Because having a lock for every variable would be inefficient and could produce deadlocks (which means that several threads are waiting for each other), the GIL was created. So GIL was my next thing to learn.

GIL and Cython

GIL stands for Global Interpreter Lock. With a single lock on the interpreter there are no deadlocks. But the execution of any Python program must acquire the interpreter lock, which prevents some programs to take advantage of the multi-threading execution.

When writing C extensions, this lock is very useful because it can be released. Thus, the program can be more efficient (i.e. threads can actually run in parallel). To write a function with the GIL I spent many time reading about it. Unfortunately, nothing seemed to expain what I wanted to do until I found this nice blog from Nicolas Hug in which he explains the 3 rules you have to follow to make Cython release the GIL.

First of all, Cython needs to know which C functions that were imported are thread-safe. This is done by using the nogil statement in the function declaration. Then, inside the function the with nogil statement lets Cython know that this block is going to be executed with the GIL released. But to make that code block safe, there cannot be any Python interaction inside that block.

To understand it better, an example is shown below:

cdef extern from "math_operation.h":
    int add(int a, int b)nogil

cpdef sum(src, dest):
    cdef int len_src = len(src)
    cdef int len_dest = len(dest)
    cdef int result
    with nogil:
        # Code with the GIL released
        result = add(len_src, len_dest)
    # Code with the GIL, any Python interaction can be done here

The function sum returns the result of adding the length of src and dest. As you can see, the function has been defined with the cpdef statement instead of the def. The c lets Cython know that this function can be called with C. So this is necessary when writing a function with the GIL released, otherwise you will be trying to execute a Python program without the GIL (which, as explained previously cannot be done). Notice that len_src and len_dest have also been defined as C integers with the cdef int statement. If not, it would not be possible to work with them with the GIL released (the with nogil block).

On the other hand, the p lets Cython know that this function can be called through Python. This does not have to be done always, only when you want to call that function from Python.

Cython typed memoryviews

One of the main differences between the python-blosc and python-blosc2 API, is that the functions compress_ptr and decompress_ptr are no longer supported. We decided to do so, because the Pickle protocol 5 already makes an optimization of the copies. That way, we could have a similar performance for compress_ptr and decompress_ptr but with the functions pack and unpack.

However, when timing the functions I realised that in the majority of the cases, although the compress function from python-blosc2 was faster than the compress_ptr, the decompress function was slower than the decompress_ptr. Thus I checked the code to see if the speed could somehow be increased.

Originally, the code used the Python Buffer Protocol. which is part of the Python/C API. The Python Buffer Protocol lets you (among other things) obtain a pointer to the raw data of an object. But because it wasn't clear for me wether it needed to do a copy or not we decided to work with Cython typed memoryviews.

Cython typed memoryviews are very similar to Python memory views, but with the main difference that the first ones are a C-level type and therefore they do not have much Python overhead. Because it is a C-level type you have to know the dimension of the buffer from which you want to obtain the typed memoryview as well as its data type.

The shape dimension of the buffer is expressed writing as many : between brackets as dimensions it has. If the memory is allocated contiguously, you can write ::1 instead in the corresponding dimension. On the other hand, the type is expressed as you would do it in Cython. In the following code, you can see an example for a one-dimensional numpy array:

import numpy as np
arr = np.ones((10**6,), dtype=np.double)
cdef double [:] typed_view = arr

However, if you want to define a function that receives an object whose type may be unknown, you will have to create a Python memoryview and then cast it into the type you wish as in the next example:

# Get a Python memoryview from an object
mem_view = memoryview(object)
# Cast that memory view into an unsigned char memoryview
cdef unsigned char[:]typed_view = mem_view.cast('B')

The 'B' indicates to cast the memoryview type into an unsigned char.

But if I run the latter code for a binary Python string, it produces a runtime error. It took me 10 minutes to fix the error adding the const statement to the definition of the Cython typed memoryview (as shown below), but I spent two days trying to understand the error and its solution.

# Get a Python memoryview from an object
mem_view = memoryview(object)
# Cast that memory view into an unsigned char memoryview
cdef const unsigned char[:]typed_view = mem_view.cast('B')

The reason why the const statement fixed it, is that a binary Python string is a read-only buffer. By declaring the typed memoryview to const, Cython is being told that the object from the memory view is a read-only buffer so that it cannot change it.


So far, my experience wrapping C-Blosc2 has had some ups and downs.

One method that I use whenever I learn something new is to write down a summary of what I read. Sometimes is almost a copy (therefore some people may find it useless), but it always works really well for me. It helps me connect the ideas better or to build a global idea of what I have or want to do.

Another aspect I realized when doing this wrapper is that because I am a stubborn person, I usually tend to force myself to try to understand something and get frustrated if I do not. However, I have to recognize that sometimes it is better to forget about it until the next day. Your brain will organize your ideas at night so that you can invest better your time the next morning.

But maybe the most difficult part for me was the beginning, and therefore I have to thank Francesc Alted and Aleix Alcacer for giving me a push into the not always easy world of Python extensions.

C-Blosc2 Ready for General Review

On behalf of the Blosc team, we are happy to announce the first C-Blosc2 release (Release Candidate 1) that is meant to be reviewed by users. As of now we are declaring both the API and the format frozen, and we are seeking for feedback from the community so as to better check the library and declare it apt for its use in production.

Some history

The next generation Blosc (aka Blosc2) started back in 2015 as a way to overcome some limitations of the Blosc compressor, mainly the limitation of 2 GB for the size of data to be compressed. But it turned out that I wanted to make thinks a bit more complete, and provide a native serialization too. During that process Google awarded my contributions to Blosc with the Open Source Peer Bonus Program in 2017. This award represented a big emotional push for me in persisting in the efforts towards producing a stable release.

Back in 2018, Zeeman Wang from Huawei invited me to go to their central headquarters in Shenzhen to meet a series of developers that were trying to use compression in a series of scenarios. During two weeks we had a series of productive meetings, and I got aware of the many possibilities that compression is opening in industry: since making phones with limited hardware to work faster to accelerate computations on high-end computers. That was also a great opportunity for me to better know a millennial culture; I was genuinely interested to see how people live, eat and socialize in China.

In 2020, Huawei graciously offered a grant to the Blosc project to complete the project. Since then, we have got donations from several other sources (like NumFOCUS, Python Software Foundation, ESRF among them). Lately ironArray is sponsoring two of us (Aleix Alcacer and myself) to work partial time on Blosc related projects.

Thanks to all this support, the Blosc development team has been able to grow quite a lot (we are currently 5 people in the core team) and we have been able to work hard at producing a series of improvements in different projects under the Blosc umbrella, in particular C-Blosc2, Python-Blosc2, Caterva and cat4py.

As you see, there is a lot of development going on around C-Blosc2 other than C-Blosc2 itself. In this installment I am going to focus just on the main features that C-Blosc2 is bringing, but hopefully all the other projects in the ecosystem will also complement its existing functionality. When all these projects would be ready, we hope that users will be able to use them to store big amounts of data in a way that is both efficient, easy-to-use and most importantly, adapted to their needs.

New features of C-Blosc2

Here it is the list of the main features that we are releasing today:

  • 64-bit containers: the first-class container in C-Blosc2 is the super-chunk or, for brevity, schunk, that is made by smaller chunks which are essentially C-Blosc1 32-bit containers. The super-chunk can be backed or not by another container which is called a frame (see later).

  • More filters: besides shuffle and bitshuffle already present in C-Blosc1, C-Blosc2 already implements:

    • delta: the stored blocks inside a chunk are diff'ed with respect to first block in the chunk. The idea is that, in some situations, the diff will have more zeros than the original data, leading to better compression.

    • trunc_prec: it zeroes the least significant bits of the mantissa of float32 and float64 types. When combined with the shuffle or bitshuffle filter, this leads to more contiguous zeros, which are compressed better.

  • A filter pipeline: the different filters can be pipelined so that the output of one can the input for the other. A possible example is a delta followed by shuffle, or as described above, trunc_prec followed by bitshuffle.

  • Prefilters: allows to apply user-defined C callbacks prior the filter pipeline during compression. See test_prefilter.c for an example of use.

  • Postfilters: allows to apply user-defined C callbacks after the filter pipeline during decompression. The combination of prefilters and postfilters could be interesting for supporting e.g. encryption (via prefilters) and decryption (via postfilters). Also, a postfilter alone can used to produce on-the-flight computation based on existing data (or other metadata, like e.g. coordinates). See test_postfilter.c for an example of use.

  • SIMD support for ARM (NEON): this allows for faster operation on ARM architectures. Only shuffle is supported right now, but the idea is to implement bitshuffle for NEON too. Thanks to Lucian Marc.

  • SIMD support for PowerPC (ALTIVEC): this allows for faster operation on PowerPC architectures. Both shuffle and bitshuffle are supported; however, this has been done via a transparent mapping from SSE2 into ALTIVEC emulation in GCC 8, so performance could be better (but still, it is already a nice improvement over native C code; see PR https://github.com/Blosc/c-blosc2/pull/59 for details). Thanks to Jerome Kieffer and ESRF for sponsoring the Blosc team in helping him in this task.

  • Dictionaries: when a block is going to be compressed, C-Blosc2 can use a previously made dictionary (stored in the header of the super-chunk) for compressing all the blocks that are part of the chunks. This usually improves the compression ratio, as well as the decompression speed, at the expense of a (small) overhead in compression speed. Currently, it is only supported in the zstd codec, but would be nice to extend it to lz4 and blosclz at least.

  • Contiguous frames: allow to store super-chunks contiguously, either on-disk or in-memory. When a super-chunk is backed by a frame, instead of storing all the chunks sparsely in-memory, they are serialized inside the frame container. The frame can be stored on-disk too, meaning that persistence of super-chunks is supported.

  • Sparse frames (on-disk): each chunk in a super-chunk is stored in a separate file, as well as the metadata. This is the counterpart of in-memory super-chunk, and allows for more efficient updates than in frames (i.e. avoiding 'holes' in monolithic files).

  • Partial chunk reads: there is support for reading just part of chunks, so avoiding to read the whole thing and then discard the unnecessary data.

  • Parallel chunk reads: when several blocks of a chunk are to be read, this is done in parallel by the decompressing machinery. That means that every thread is responsible to read, post-filter and decompress a block by itself, leading to an efficient overlap of I/O and CPU usage that optimizes reads to a maximum.

  • Meta-layers: optionally, the user can add meta-data for different uses and in different layers. For example, one may think on providing a meta-layer for NumPy so that most of the meta-data for it is stored in a meta-layer; then, one can place another meta-layer on top of the latter for adding more high-level info if desired (e.g. geo-spatial, meteorological...).

  • Variable length meta-layers: the user may want to add variable-length meta information that can be potentially very large (up to 2 GB). The regular meta-layer described above is very quick to read, but meant to store fixed-length and relatively small meta information. Variable length metalayers are stored in the trailer of a frame, whereas regular meta-layers are in the header.

  • Efficient support for special values: large sequences of repeated values can be represented with an efficient, simple and fast run-length representation, without the need to use regular codecs. With that, chunks or super-chunks with values that are the same (zeros, NaNs or any value in general) can be built in constant time, regardless of the size. This can be useful in situations where a lot of zeros (or NaNs) need to be stored (e.g. sparse matrices).

  • Nice markup for documentation: we are currently using a combination of Sphinx + Doxygen + Breathe for documenting the C-API. See https://c-blosc2.readthedocs.io. Thanks to Alberto Sabater and Aleix Alcacer for contributing the support for this.

  • Plugin capabilities for filters and codecs: we have a plugin register capability inplace so that the info about the new filters and codecs can be persisted and transmitted to different machines. Thanks to the NumFOCUS foundation for providing a grant for doing this.

  • Pluggable tuning capabilities: this will allow users with different needs to define an interface so as to better tune different parameters like the codec, the compression level, the filters to use, the blocksize or the shuffle size. Thanks to ironArray for sponsoring us in doing this.

  • Support for I/O plugins: so that users can extend the I/O capabilities beyond the current filesystem support. Things like use databases or S3 interfaces should be possible by implementing these interfaces. Thanks to ironArray for sponsoring us in doing this.

  • Python wrapper: we have a preliminary wrapper in the works. You can have a look at our ongoing efforts in the python-blosc2 repo. Thanks to the Python Software Foundation for providing a grant for doing this.

  • Security: we are actively using using the OSS-Fuzz and ClusterFuzz for uncovering programming errors in C-Blosc2. Thanks to Google for sponsoring us in doing this.

As you see, the list is long and hopefully you will find compelling enough features for your own needs. Blosc2 is not only about speed, but also about providing

Tasks to be done

Even if the list of features above is long, we still have things to do in Blosc2; and the plan is to continue the development, although always respecting the existing API and format. Here are some of the things in our TODO list:

  • Centralized plugin repository: we have got a grant from NumFOCUS for implementing a centralized repository so that people can send their plugins (using the existing machinery) to the Blosc2 team. If the plugins fulfill a series of requirements, they will be officially accepted, and distributed withing the library.

  • Improve the safety of the library: although this is always a work in progress, we did a long way in improving our safety, mainly thanks to the efforts of Nathan Moinvaziri.

  • Support for lossy compression codecs: although we already support the trunc_prec filter, this is only valid for floating point data; we should come with lossy codecs that are meant for any data type.

  • Checksums: the frame can benefit from having a checksum per every chunk/index/metalayer. This will provide more safety towards frames that are damaged for whatever reason. Also, this would provide better feedback when trying to determine the parts of the frame that are corrupted. Candidates for checksums can be the xxhash32 or xxhash64, depending on the goals (to be decided).

  • Documentation: utterly important for attracting new users and making the life easier for existing ones. Important points to have in mind here:

    • Quality of API docstrings: is the mission of the functions or data structures clearly and succinctly explained? Are all the parameters explained? Is the return value explained? What are the possible errors that can be returned?.

    • Tutorials/book: besides the API docstrings, more documentation materials should be provided, like tutorials or a book about Blosc (or at least, the beginnings of it). Due to its adoption in GitHub and Jupyter notebooks, one of the most extended and useful markup systems is Markdown, so this should also be the first candidate to use here.

  • Lock support for super-chunks: when different processes are accessing concurrently to super-chunks, make them to sync properly by using locks, either on-disk (frame-backed super-chunks), or in-memory. Such a lock support would be configured in build time, so it could be disabled with a cmake flag.

It would be nice that, in case some of this feature (or a new one) sounds useful for you, you can help us in providing either code or sponsorship.


Since 2015, it has been a long time to get C-Blosc2 so much featured and tested. But hopefully the journey will continue because as Kavafis said:

As you set out for Ithaka
hope your road is a long one,
full of adventure, full of discovery.

Let me thank again all the people and sponsors that we have had during the life of the Blosc project; without them we would not be where we are now. We do hope that C-Blosc2 will have a long life and we as a team will put our soul in making that trip to last as long as possible.

Now is your turn. We expect you to start testing the library as much as possible and report back. With your help we can get C-Blosc2 in production stage hopefully very soon. Thanks in advance!

Blosc metalayers, where the user metainformation is stored

The C-Blosc2 library has two different spaces to store user-defined information. In this post, we are going to describe what these spaces are and where they are stored inside a Blosc2 frame (a persistent super-chunk).

As its name suggests, a metalayer is a space that allows users to store custom information. For example, Caterva, a project based on C-Blosc2 that handles compressed and chunked arrays, uses these metalayers to store the dimensions and the shape, chunkshape and blockshape of the arrays.

Fixed-length metalayers

The first kind of metalayers in Blosc2 are the fixed-length metalayers. These metalayers are stored in the header of the frame. This decision allows adding chunks to the frame without the need to rewrite the whole meta information and data coming after it.

But this implementation has some drawbacks. The most important one is that fixed-length metalayers cannot be resized. Furthermore, once the first chunk of data is added to the super-chunk, no more fixed-length metalayers can be added either.

Let's see with an example the reason for these restrictions. Supose that we have a frame that stores 10 GB of data with a metalayer containing a "cat". If we update the meta information with a "dog" we can do that because they have exactly the same size.

However, if we were to update the meta information with a "giraffe", the metalayer would need to be resized and therefore we would have to rewrite the 10GB of data plus the trailer. This would obviously be very inefficient and hence, not allowed:


Data that would need to be rewritten are ploted in red.

Variable-length metalayers

To fix the above issue, we have introduced variable-length metalayers. Unlike fixed-length metalayers, these are stored in the trailer section of the frame.

As their name suggests, these metalayers can be resized. Blosc can do that because, whenever the metalayers content are modified, Blosc rewrites the trailer completely, using more space if necessary. Furthermore, and since these metalayers are stored in the trailer, they will also be rewritten each time a chunk is added.

Another feature of variable-length metalayers is that their content is compressed by default (in contrast to fixed-length metalayers). This will minimize the size of the trailer, a very important feature because since the trailer is rewritten every time new data is added, we want to keep it as small as possible so as to optimize data written.

Let's continue with the previous example, but storing the meta information in a variable-length metalayer now:


In this case the trailer is rewritten each time that we update the metalayer, but it is a much more efficient operation than rewriting all the data (as a fixed-length metalayer would require). So the variable-length metalayers complement the fixed-length metalayers by bringing different capabilities on the table. Depending on her needs, it is up to the user to choose one or another metalayer storage.

Fixed-length vs variable-length metalayers comparsion

To summarize, and to better see what kind of metalayer is more suitable for each situation, the following table contains a comparison between fixed-length metalayers and variable-length metalayers:

Fixed-length metalayers

Variable-length metalayers

Where are stored?



Can be resized?



Can be added after adding chunks?



Are they rewritten when adding chunks?



Metalayers API

Currently, C-Blosc2 has the following functions implemented:

  • blosc2_meta_add() / blosc2_vlmeta_add(): Add a new metalayer.

  • blosc2_meta_get() / blosc2_vlmeta_get(): Get the metalayer content.

  • blosc2_meta_exists() / blosc2_vlmeta_exists(): Check if a metalayer exists or not.

  • blosc2_meta_update() / blosc2_vlmeta_update(): Update the metalayer content.


As we have seen, Blosc2 supports two different spaces where users can store their meta information. The user can choose one or another depending on her needs.

On the one hand, the fixed-length metalayers are meant to store user meta information that does not change size over time. They are stored in the header and can be updated without having to rewrite any other part of the frame, but they can no longer be added once the first chunk of data is added.

On the other hand, for users storing meta information that is going to change in size over time, they can store their meta information into variable-length metalayers. These are stored in the trailer section of a frame and are more flexible than its fixed-length counterparts. However, each time that a metalayer content is updated, the whole trailer has to be rewritten.