SChunk.__init__#

SChunk.__init__(chunksize=None, data=None, **kwargs)#

Create a new super-chunk, or open an existing one.

Parameters:
  • chunksize (int, optional) – The size, in bytes, of the chunks from the super-chunk. If not provided, it is set automatically to a reasonable value.

  • data (bytes-like object, optional) – The data to be split into different chunks of size chunksize. If None, the Schunk instance will be empty initially.

  • kwargs (dict, optional) –

    Keyword arguments supported:

    contiguous: bool, optional

    If the chunks are stored contiguously or not. Default is True when urlpath is not None; False otherwise.

    urlpath: str | pathlib.Path, optional

    If the storage is persistent, the name of the file (when contiguous = True) or the directory (if contiguous = False). If the storage is in-memory, then this field is None.

    mode: str, optional

    Persistence mode: ‘r’ means read only (must exist); ‘a’ means read/write (create if it doesn’t exist); ‘w’ means create (overwrite if it exists).

    mmap_mode: str, optional

    If set, the file will be memory-mapped instead of using the default I/O functions and the mode argument will be ignored. The memory-mapping modes are similar as used by the numpy.memmap function, but it is possible to extend the file:

    mode

    description

    ’r’

    Open an existing file for reading only.

    ’r+’

    Open an existing file for reading and writing. Use this mode if you want to append data to an existing schunk file.

    ’w+’

    Create or overwrite an existing file for reading and writing. Use this mode if you want to create a new schunk.

    ’c’

    Open an existing file in copy-on-write mode: all changes affect the data in memory but changes are not saved to disk. The file on disk is read-only. On Windows, the size of the mapping cannot change.

    Only contiguous storage can be memory-mapped. Hence, urlpath must point to a file (and not a directory).

    Note

    Memory-mapped files are opened once and the file contents remain in (virtual) memory for the lifetime of the schunk. Using memory-mapped I/O can be faster than using the default I/O functions depending on the use case. Whereas reading performance is generally better, writing performance may also be slower in some cases on certain systems. In any case, memory-mapped files can be especially beneficial when operating with network file systems (like NFS).

    This is currently a beta feature (especially write operations) and we recommend trying it out and reporting any issues you may encounter.

    initial_mapping_size: int, optional

    The initial size of the mapping for the memory-mapped file when writes are allowed (r+ w+, or c mode). Once a file is memory-mapped and extended beyond the initial mapping size, the file must be remapped which may be expensive. This parameter allows to decouple the mapping size from the actual file size to early reserve memory for future writes and avoid remappings. The memory is only reserved virtually and does not occupy physical memory unless actual writes happen. Since the virtual address space is large enough, it is ok to be generous with this parameter (with special consideration on Windows, see note below). For best performance, set this to the maximum expected size of the compressed data (see example in SChunk.__init__). The size is in bytes.

    Default: 1 GiB.

    Note

    On Windows, the size of the mapping is directly coupled to the file size. When the schunk gets destroyed, the file size will be truncated to the actual size of the schunk.

    cparams: dict

    A dictionary with the compression parameters, which are the same as those can be used in the compress2() function.

    dparams: dict

    A dictionary with the decompression parameters, which are the same as those that can be used in the decompress2() function.

    meta: dict or None

    A dictionary with different metalayers. One entry per metalayer:

    key: bytes or str

    The name of the metalayer.

    value: object

    The metalayer object that will be serialized using msgpack.

Examples

>>> import blosc2
>>> storage = {"contiguous": True, "cparams": {}, "dparams": {}}
>>> schunk = blosc2.SChunk(**storage)

In the following, we will write and read a super-chunk to and from disk via memory-mapped files.

>>> a = np.arange(3, dtype=np.int64)
>>> chunksize = a.size * a.itemsize
>>> n_chunks = 2
>>> urlpath = getfixture('tmp_path') / "schunk.b2frame"

Optional: we intend to write 2 chunks of 24 bytes each, and we expect the compressed size to be smaller than the original size. Hence, we (generously) set the initial size of the mapping to 48 bytes effectively avoiding remappings.

>>> initial_mapping_size = chunksize * n_chunks
>>> schunk_mmap = blosc2.SChunk(
...     chunksize=chunksize,
...     mmap_mode="w+",
...     initial_mapping_size=initial_mapping_size,
...     urlpath=urlpath,
... )
>>> schunk_mmap.append_data(a)
1
>>> schunk_mmap.append_data(a * 2)
2

Optional: explicitly close the file and free the mapping.

>>> del schunk_mmap

Reading the data back again via memory-mapped files:

>>> schunk_mmap = blosc2.open(urlpath, mmap_mode="r")
>>> np.frombuffer(schunk_mmap.decompress_chunk(0), dtype=np.int64).tolist()
[0, 1, 2]
>>> np.frombuffer(schunk_mmap.decompress_chunk(1), dtype=np.int64).tolist()
[0, 2, 4]