Compression, decompression and storage parameters¶
Dataclasses for setting the compression, decompression and storage parameters. All their parameters are optional.
|
Dataclass for hosting the different compression parameters. |
|
Dataclass for hosting the different decompression parameters. |
|
Dataclass for hosting the different storage parameters. |
CParams¶
- class blosc2.CParams(codec: ~blosc2.Codec | int = Codec.ZSTD, codec_meta: int = 0, clevel: int = 5, use_dict: bool = False, typesize: int = 8, nthreads: int = <factory>, blocksize: int = 0, splitmode: ~blosc2.SplitMode = SplitMode.AUTO_SPLIT, filters: list[~blosc2.Filter | int] = <factory>, filters_meta: list[int] = <factory>, tuner: ~blosc2.Tuner = Tuner.STUNE)[source]¶
Dataclass for hosting the different compression parameters.
- Parameters:
codec¶ (
Codec
or int) – The compressor code. Default isCodec.ZSTD
.codec_meta¶ (int) – The metadata for the compressor code. Default is 0.
clevel¶ (int) – The compression level from 0 (no compression) to 9 (maximum compression). Default is 1.
use_dict¶ (bool) – Whether to use dictionaries when compressing (only for
blosc2.Codec.ZSTD
). Default is False.typesize¶ (int) – The data type size, ranging from 1 to 255. Default is 8.
nthreads¶ (int) – The number of threads to use internally. By default, the value of
blosc2.nthreads
is used. If not set withblosc2.set_nthreads()
, blosc2 computes a good guess for it.blocksize¶ (int) – The requested size of the compressed blocks. If set to 0 (the default) blosc2 will choose the size automatically.
splitmode¶ (
SplitMode
) – The split mode for the blocks. The default value isSplitMode.AUTO_SPLIT
.filters¶ (
Filter
or int list or None) – The sequence of filters. Default: [Filter.NOFILTER
,Filter.NOFILTER
,Filter.NOFILTER
,Filter.NOFILTER
,Filter.NOFILTER
,Filter.SHUFFLE
].filters_meta¶ (list) – The metadata for filters. Default: [0, 0, 0, 0, 0, 0].
tuner¶ (
Tuner
) – The tuner to use. Default:Tuner.STUNE
.
DParams¶
- class blosc2.DParams(nthreads: int = <factory>)[source]¶
Dataclass for hosting the different decompression parameters.
- Parameters:
nthreads¶ (int) – The number of threads to use internally. By default, the value of
blosc2.nthreads
is used. If not set withblosc2.set_nthreads()
, blosc2 computes a good guess for it.
Storage¶
- class blosc2.Storage(contiguous: bool = None, urlpath: str = None, mode: str = 'a', mmap_mode: str = None, initial_mapping_size: int = None, meta: dict = None)[source]¶
Dataclass for hosting the different storage parameters.
- Parameters:
contiguous¶ (bool) – Indicates whether the chunks are stored contiguously. Default is True when
urlpath
is not None; False otherwise.urlpath¶ (str or pathlib.Path, optional) – If the storage is persistent, the name of the file (when contiguous = True) or the directory (if contiguous = False). If the storage is in-memory, then this field is None.
mode¶ (str, optional) – Persistence mode: ‘r’ means read only (must exist); ‘a’ means read/write (create if it doesn’t exist); ‘w’ means create (overwrite if it exists). Default is ‘a’.
mmap_mode¶ (str, optional) –
If set, the file will be memory-mapped instead of using the default I/O functions and the mode argument will be ignored. The memory-mapping modes are similar to those used by the numpy.memmap function, but it is possible to extend the file:
mode
description
’r’
Open an existing file for reading only.
’r+’
Open an existing file for reading and writing. Use this mode if you want to append data to an existing schunk file.
’w+’
Create or overwrite an existing file for reading and writing. Use this mode if you want to create a new schunk.
’c’
Open an existing file in copy-on-write mode: all changes affect the data in memory but changes are not saved to disk. The file on disk is read-only. On Windows, the size of the mapping cannot change.
Only contiguous storage can be memory-mapped. Hence, urlpath must point to a file (and not a directory).
Note
Memory-mapped files are opened once, and their contents remain in (virtual) memory for the lifetime of the schunk. Using memory-mapped I/O can be faster than the default I/O functions, depending on the use case. While reading performance is generally better, writing performance may be slower in some cases on certain systems. Memory-mapped files can be especially beneficial when operating with network file systems (like NFS).
This is currently a beta feature (especially for write operations) and we recommend trying it out and reporting any issues you may encounter.
initial_mapping_size¶ (int, optional) –
The initial size of the mapping for the memory-mapped file when writes are allowed (r+ w+, or c mode). Once a file is memory-mapped and extended beyond the initial mapping size, the file must be remapped, which may be expensive. This parameter allows decoupling the mapping size from the actual file size to reserve memory early for future writes and avoid remappings. The memory is only reserved virtually and does not occupy physical memory unless actual writes occur. Since the virtual address space is large enough, it is ok to be generous with this parameter (with special consideration on Windows, see note below). For best performance, set this to the maximum expected size of the compressed data (see example in
SChunk.__init__
). The size is in bytes.Default: 1 GiB.
Note
On Windows, the size of the mapping is directly coupled to the file size. When the schunk is destroyed, the file size will be truncated to the actual size of the schunk.
meta¶ (dict or None) –
A dictionary with different metalayers. Each entry represents a metalayer:
- key: bytes or str
The name of the metalayer.
- value: object
The metalayer object that will be serialized using msgpack.
- Attributes:
- contiguous
- initial_mapping_size
- meta
- mmap_mode
- urlpath