blosc2.open#

blosc2.open(urlpath, mode='a', offset=0, **kwargs)#

Open a persistent SChunk or NDArray or a remote C2Array.

Parameters:
  • urlpath (str | pathlib.Path | URLPath class) – The path where the SChunk (or NDArray) is stored. In case it is a remote array, a URLPath class must be passed.

  • mode (str, optional) – The open mode.

  • offset (int, optional) – An offset in the file where super-chunk or array data is located (e.g. in a file containing several such objects).

  • kwargs (dict, optional) –

    Keyword arguments supported:
    mmap_mode: str, optional

    If set, the file will be memory-mapped instead of using the default I/O functions and the mode argument will be ignored. The memory-mapping modes are similar as used by the numpy.memmap function, but it is possible to extend the file:

    mode

    description

    ’r’

    Open an existing file for reading only.

    ’r+’

    Open an existing file for reading and writing. Use this mode if you want to append data to an existing schunk file.

    ’c’

    Open an existing file in copy-on-write mode: all changes affect the data in memory but changes are not saved to disk. The file on disk is read-only. On Windows, the size of the mapping cannot change.

    Only contiguous storage can be memory-mapped. Hence, urlpath must point to a file (and not a directory).

    Note

    Memory-mapped files are opened once and the file contents remain in (virtual) memory for the lifetime of the schunk. Using memory-mapped I/O can be faster than using the default I/O functions depending on the use case. Whereas reading performance is generally better, writing performance may also be slower in some cases on certain systems. In any case, memory-mapped files can be especially beneficial when operating with network file systems (like NFS).

    This is currently a beta feature (especially write operations) and we recommend trying it out and reporting any issues you may encounter.

    initial_mapping_size: int, optional

    The initial size of the mapping for the memory-mapped file when writes are allowed (r+ or c mode). Once a file is memory-mapped and extended beyond the initial mapping size, the file must be remapped which may be expensive. This parameter allows to decouple the mapping size from the actual file size to early reserve memory for future writes and avoid remappings. The memory is only reserved virtually and does not occupy physical memory unless actual writes happen. Since the virtual address space is large enough, it is ok to be generous with this parameter (with special consideration on Windows, see note below). For best performance, set this to the maximum expected size of the compressed data (see example in SChunk.__init__). The size is in bytes.

    Default: 1 GiB.

    Note

    On Windows, the size of the mapping is directly coupled to the file size. When the schunk gets destroyed, the file size will be truncated to the actual size of the schunk.

    cparams: dict

    A dictionary with the compression parameters, which are the same that can be used in the compress2() function. Typesize and blocksize cannot be changed.

    dparams: dict

    A dictionary with the decompression parameters, which are the same that can be used in the decompress2() function.

Notes

  • This is just a ‘logical’ open, so no there is not a close() counterpart because currently there is no need for it.

  • In case urlpath is a URLPath class instance, mode must be ‘r’, offset must be 0, and kwargs cannot be passed.

Returns:

out – The SChunk or NDArray (in case there is a “b2nd” metalayer”) or the C2Array if urlpath is a blosc2.URLPath instance.

Return type:

SChunk, NDArray or C2Array

Examples

>>> import blosc2
>>> import numpy as np
>>> storage = {"contiguous": True, "urlpath": getfixture('tmp_path') / "b2frame", "mode": "w"}
>>> nelem = 20 * 1000
>>> nchunks = 5
>>> chunksize = nelem * 4 // nchunks
>>> data = np.arange(nelem, dtype="int32")
>>> # Create SChunk and append data
>>> schunk = blosc2.SChunk(chunksize=chunksize, data=data.tobytes(), **storage)
>>> # Open SChunk
>>> sc_open = blosc2.open(urlpath=storage["urlpath"])
>>> for i in range(nchunks):
...     dest = np.empty(nelem // nchunks, dtype=data.dtype)
...     schunk.decompress_chunk(i, dest)
...     dest1 = np.empty(nelem // nchunks, dtype=data.dtype)
...     sc_open.decompress_chunk(i, dest1)
...     np.array_equal(dest, dest1)
True
True
True
True
True

To open the same schunk memory-mapped, we simply need to pass the mmap_mode parameter:

>>> sc_open_mmap = blosc2.open(urlpath=storage["urlpath"], mmap_mode="r")
>>> sc_open.nchunks == sc_open_mmap.nchunks
True
>>> all(sc_open.decompress_chunk(i, dest1) == sc_open_mmap.decompress_chunk(i, dest1) for i in range(nchunks))
True