Python-Blosc2#
A Python wrapper for the extremely fast Blosc2 compression library#
- Author:
The Blosc development team
- Contact:
- Github:
- Actions:
- PyPi:
- NumFOCUS:
- Code of Conduct:
What it is#
C-Blosc2 is the new major version of C-Blosc, and is backward compatible with both the C-Blosc1 API and its in-memory format. Python-Blosc2 is a Python package that wraps C-Blosc2, the newest version of the Blosc compressor.
Currently Python-Blosc2 already reproduces the API of Python-Blosc, so it can be used as a drop-in replacement. However, there are a few exceptions for a full compatibility.
In addition, Python-Blosc2 aims to leverage the new C-Blosc2 API so as to support super-chunks, multi-dimensional arrays (NDArray), serialization and other bells and whistles introduced in C-Blosc2. Although this is always and endless process, we have already catch up with most of the C-Blosc2 API capabilities.
Note: Python-Blosc2 is meant to be backward compatible with Python-Blosc data. That means that it can read data generated with Python-Blosc, but the opposite is not true (i.e. there is no forward compatibility).
SChunk: a 64-bit compressed store#
SChunk is the simple data container that handles setting, expanding and getting data and metadata. Contrarily to chunks, a super-chunk can update and resize the data that it contains, supports user metadata, and it does not have the 2 GB storage limitation.
Additionally, you can convert a SChunk into a contiguous, serialized buffer (aka cframe) and vice-versa; as a bonus, the serialization/deserialization process also works with NumPy arrays and PyTorch/TensorFlow tensors at a blazing speed:
while reaching excellent compression ratios:

Also, if you are a Mac M1/M2 owner, make you a favor and use its native arm64 arch (yes, we are distributing Mac arm64 wheels too; you are welcome ;-):
Read more about SChunk features in our blog entry at: https://www.blosc.org/posts/python-blosc2-improvements
NDArray: an N-Dimensional store#
One of the latest and more exciting additions in Python-Blosc2 is the NDArray object. It can write and read n-dimensional datasets in an extremely efficient way thanks to a n-dim 2-level partitioning, allowing to slice and dice arbitrary large and compressed data in a more fine-grained way:

To wet you appetite, here it is how the NDArray object performs on getting slices orthogonal to the different axis of a 4-dim dataset:

We have blogged about this: https://www.blosc.org/posts/blosc2-ndim-intro
We also have a ~2 min explanatory video on why slicing in a pineapple-style (aka double partition) is useful:

Installing#
Blosc is now offering Python wheels for the main OS (Win, Mac and Linux) and platforms.
You can install binary packages from PyPi using pip
:
pip install blosc2
Documentation#
The documentation is here:
https://blosc.org/python-blosc2/python-blosc2.html
Also, some examples are available on:
Building from sources#
python-blosc2 comes with the C-Blosc2 sources with it and can be built in-place:
git clone https://github.com/Blosc/python-blosc2/
cd python-blosc2
git submodule update --init --recursive
python -m pip install -r requirements-build.txt
python setup.py build_ext --inplace
That’s all. You can proceed with testing section now.
Testing#
After compiling, you can quickly check that the package is sane by running the tests:
python -m pip install -r requirements-tests.txt
python -m pytest (add -v for verbose mode)
Benchmarking#
If curious, you may want to run a small benchmark that compares a plain NumPy array copy against compression through different compressors in your Blosc build:
PYTHONPATH=. python bench/pack_compress.py
License#
The software is licenses under a 3-Clause BSD license. A copy of the python-blosc2 license can be found in LICENSE.txt.
Mailing list#
Discussion about this module is welcome in the Blosc list:
Twitter#
Please follow @Blosc2 to get informed about the latest developments.
Enjoy data!
Contents#
- API Reference
- Release Notes
- Changes from 2.2.2 to 2.2.3
- Changes from 2.2.1 to 2.2.2
- Changes from 2.2.0 to 2.2.1
- Changes from 2.1.1 to 2.2.0
- Changes from 2.0.0 to 2.1.1
- Changes from 0.6.6 to 2.0.0
- Changes from 0.6.5 to 0.6.6
- Changes from 0.6.4 to 0.6.5
- Changes from 0.6.3 to 0.6.4
- Changes from 0.6.2 to 0.6.3
- Changes from 0.6.1 to 0.6.2
- Changes from 0.5.2 to 0.6.1
- Changes from 0.5.1 to 0.5.2
- Changes from 0.5.0 to 0.5.1
- Changes from 0.4.1 to 0.5.0
- Changes from 0.4.0 to 0.4.1
- Changes from 0.3.2 to 0.4.0
- Changes from 0.3.1 to 0.3.2
- Changes from 0.3.0 to 0.3.1
- Changes from 0.2.0 to 0.3.0
- Changes from 0.1.10 to 0.2.0
- Changes from python-blosc2 0.1.9 to python-blosc2 0.1.10
- Changes from python-blosc2 0.1.8 to python-blosc2 0.1.9
- Changes from python-blosc2 0.1.7 to python-blosc2 0.1.8
- Changes from python-blosc2 0.1.5 to python-blosc2 0.1.7
- Changes from python-blosc2 0.1.1 to python-blosc2 0.1.5
- Changes from python-blosc to python-blosc2