Save and load

save(array, urlpath[, contiguous])

Save an array to a file.

open(urlpath[, mode, offset])

Open a persistent SChunk, NDArray, a remote C2Array or a Proxy

save_array(arr, urlpath[, chunksize])

Save a serialized NumPy array to a specified file path.

load_array(urlpath[, dparams])

Load a serialized NumPy array from a file.

save_tensor(tensor, urlpath[, chunksize])

Save a serialized PyTorch or TensorFlow tensor or NumPy array to a specified file path.

load_tensor(urlpath[, dparams])

Load a serialized PyTorch or TensorFlow tensor or NumPy array from a file.

blosc2.save(array: NDArray, urlpath: str, contiguous=True, **kwargs: Any) None[source]

Save an array to a file.

Parameters:
  • array (NDArray) – The array to be saved.

  • urlpath (str) – The path to the file where the array will be saved.

  • contiguous (bool, optional) – Whether to store the array contiguously.

  • kwargs (dict, optional) – Keyword arguments that are supported by the save() method.

Examples

>>> import blosc2
>>> import numpy as np
>>> # Create an array
>>> array = blosc2.arange(0, 100, dtype=np.int64, shape=(10, 10))
>>> # Save the array to a file
>>> blosc2.save(array, "array.b2", mode="w")
blosc2.open(urlpath: str | Path | URLPath, mode: str = 'a', offset: int = 0, **kwargs: dict) SChunk | NDArray | C2Array | LazyArray | Proxy[source]

Open a persistent SChunk, NDArray, a remote C2Array or a Proxy

See the Notes section for more info on opening Proxy objects.

Parameters:
  • urlpath (str | pathlib.Path | URLPath class) – The path where the SChunk (or NDArray) is stored. If it is a remote array, a URLPath class must be passed.

  • mode (str, optional) – Persistence mode: ‘r’ means read only (must exist); ‘a’ means read/write (create if it doesn’t exist); ‘w’ means create (overwrite if it exists). Default is ‘a’.

  • offset (int, optional) – An offset in the file where super-chunk or array data is located (e.g. in a file containing several such objects).

  • kwargs (dict, optional) –

    mmap_mode: str, optional

    If set, the file will be memory-mapped instead of using the default I/O functions and the mode argument will be ignored. For more info, see blosc2.Storage. Please note that the w+ mode, which can be used to create new files, is not supported here since only existing files can be opened. You can use SChunk.__init__ to create new files.

    initial_mapping_size: int, optional

    The initial size of the memory mapping. For more info, see blosc2.Storage.

    cparams: dict

    A dictionary with the compression parameters, which are the same that can be used in the compress2() function. Typesize and blocksize cannot be changed.

    dparams: dict

    A dictionary with the decompression parameters, which are the same that can be used in the decompress2() function.

Returns:

out – The SChunk or NDArray (if there is a “b2nd” metalayer”) or the C2Array if urlpath is a blosc2.URLPath instance.

Return type:

SChunk, NDArray or C2Array

Notes

  • This is just a ‘logical’ open, so there is no close() counterpart because currently, there is no need for it.

  • If urlpath is a URLPath class instance, mode must be ‘r’, offset must be 0, and kwargs cannot be passed.

  • If the original object saved in urlpath is a Proxy, this function will only return a Proxy if its source is a local SChunk, NDArray or a remote C2Array. Otherwise, it will return the Python-Blosc2 container used to cache the data which can be a SChunk or a NDArray and may not have all the data initialized (e.g. if the user has not accessed to it yet).

  • When opening a LazyExpr keep in mind the note above regarding operands.

Examples

>>> import blosc2
>>> import numpy as np
>>> import os
>>> import tempfile
>>> tmpdirname = tempfile.mkdtemp()
>>> urlpath = os.path.join(tmpdirname, 'b2frame')
>>> storage = blosc2.Storage(contiguous=True, urlpath=urlpath, mode="w")
>>> nelem = 20 * 1000
>>> nchunks = 5
>>> chunksize = nelem * 4 // nchunks
>>> data = np.arange(nelem, dtype="int32")
>>> # Create SChunk and append data
>>> schunk = blosc2.SChunk(chunksize=chunksize, data=data.tobytes(), storage=storage)
>>> # Open SChunk
>>> sc_open = blosc2.open(urlpath=urlpath)
>>> for i in range(nchunks):
...     dest = np.empty(nelem // nchunks, dtype=data.dtype)
...     schunk.decompress_chunk(i, dest)
...     dest1 = np.empty(nelem // nchunks, dtype=data.dtype)
...     sc_open.decompress_chunk(i, dest1)
...     np.array_equal(dest, dest1)
True
True
True
True
True

To open the same schunk memory-mapped, we simply need to pass the mmap_mode parameter:

>>> sc_open_mmap = blosc2.open(urlpath=urlpath, mmap_mode="r")
>>> sc_open.nchunks == sc_open_mmap.nchunks
True
>>> all(sc_open.decompress_chunk(i, dest1) == sc_open_mmap.decompress_chunk(i, dest1) for i in range(nchunks))
True
blosc2.save_array(arr: ndarray, urlpath: str, chunksize: int | None = None, **kwargs: dict) int[source]

Save a serialized NumPy array to a specified file path.

Parameters:
  • arr (np.ndarray) – The NumPy array to be saved.

  • urlpath (str) – The path for the file where the array will be saved.

  • chunksize (int) – The size (in bytes) for the chunks during compression. If not provided, it is computed automatically.

  • kwargs (dict, optional) – These are the same as the kwargs in SChunk.__init__.

Returns:

out – The number of bytes of the saved array.

Return type:

int

Examples

>>> import numpy as np
>>> a = np.arange(1e6)
>>> serial_size = blosc2.save_array(a, "test.bl2", mode="w")
>>> serial_size < a.size * a.itemsize
True
blosc2.load_array(urlpath: str, dparams: dict | None = None) ndarray[source]

Load a serialized NumPy array from a file.

Parameters:
  • urlpath (str) – The path to the file containing the serialized array.

  • dparams (dict, optional) – A dictionary with the decompression parameters, which can be used in the decompress2() function.

Returns:

out – The deserialized NumPy array.

Return type:

np.ndarray

Raises:
  • TypeError – If urlpath is not in cframe format

  • RunTimeError – If any other error is detected.

Examples

>>> import numpy as np
>>> a = np.arange(1e6)
>>> serial_size = blosc2.save_array(a, "test.bl2", mode="w")
>>> serial_size < a.size * a.itemsize
True
>>> a2 = blosc2.load_array("test.bl2")
>>> np.array_equal(a, a2)
True
blosc2.save_tensor(tensor: tensorflow.Tensor | torch.Tensor | np.ndarray, urlpath: str, chunksize: int | None = None, **kwargs: dict) int[source]

Save a serialized PyTorch or TensorFlow tensor or NumPy array to a specified file path.

Parameters:
  • tensor (tensorflow.Tensor, torch.Tensor, or np.ndarray) – The tensor or array to be saved.

  • urlpath (str) – The file path where the tensor or array will be saved.

  • chunksize (int) – The size (in bytes) for the chunks during compression. If not provided, it is computed automatically.

  • kwargs (dict, optional) – These are the same as the kwargs in SChunk.__init__.

Returns:

out – The number of bytes of the saved tensor or array.

Return type:

int

Examples

>>> import numpy as np
>>> th = np.arange(1e6, dtype=np.float32)
>>> serial_size = blosc2.save_tensor(th, "test.bl2", mode="w")
>>> if not os.getenv("BTUNE_TRADEOFF"):
...     assert serial_size < th.size * th.itemsize
...
blosc2.load_tensor(urlpath: str, dparams: dict | None = None) tensorflow.Tensor | torch.Tensor | np.ndarray[source]

Load a serialized PyTorch or TensorFlow tensor or NumPy array from a file.

Parameters:
  • urlpath (str) – The path to the file where the tensor or array is stored.

  • dparams (dict, optional) – A dictionary with the decompression parameters, which are the same as those used in the decompress2() function.

Returns:

out – The unpacked PyTorch or TensorFlow tensor or NumPy array.

Return type:

tensor or ndarray

Raises:
  • TypeError – If urlpath is not in cframe format

  • RunTimeError – If some other problem is detected.

Examples

>>> import numpy as np
>>> th = np.arange(1e6, dtype=np.float32)
>>> size = blosc2.save_tensor(th, "test.bl2", mode="w")
>>> if not os.getenv("BTUNE_TRADEOFF"):
...     assert size < th.size * th.itemsize
...
>>> th2 = blosc2.load_tensor("test.bl2")
>>> np.array_equal(th, th2)
True