blosc2.open#
- blosc2.open(urlpath, mode='a', offset=0, **kwargs)#
Open a persistent SChunk or NDArray or a remote C2Array or a Proxy (see the Notes section for more info on the latter case).
- Parameters:
urlpath¶ (str | pathlib.Path | URLPath class) – The path where the SChunk (or NDArray) is stored. In case it is a remote array, a URLPath class must be passed.
mode¶ (str, optional) – The open mode.
offset¶ (int, optional) – An offset in the file where super-chunk or array data is located (e.g. in a file containing several such objects).
kwargs¶ (dict, optional) –
- Keyword arguments supported:
- mmap_mode: str, optional
If set, the file will be memory-mapped instead of using the default I/O functions and the mode argument will be ignored. The memory-mapping modes are similar as used by the numpy.memmap function, but it is possible to extend the file:
mode
description
’r’
Open an existing file for reading only.
’r+’
Open an existing file for reading and writing. Use this mode if you want to append data to an existing schunk file.
’c’
Open an existing file in copy-on-write mode: all changes affect the data in memory but changes are not saved to disk. The file on disk is read-only. On Windows, the size of the mapping cannot change.
Only contiguous storage can be memory-mapped. Hence, urlpath must point to a file (and not a directory).
Note
Memory-mapped files are opened once and the file contents remain in (virtual) memory for the lifetime of the schunk. Using memory-mapped I/O can be faster than using the default I/O functions depending on the use case. Whereas reading performance is generally better, writing performance may also be slower in some cases on certain systems. In any case, memory-mapped files can be especially beneficial when operating with network file systems (like NFS).
This is currently a beta feature (especially write operations) and we recommend trying it out and reporting any issues you may encounter.
- initial_mapping_size: int, optional
The initial size of the mapping for the memory-mapped file when writes are allowed (r+ or c mode). Once a file is memory-mapped and extended beyond the initial mapping size, the file must be remapped which may be expensive. This parameter allows to decouple the mapping size from the actual file size to early reserve memory for future writes and avoid remappings. The memory is only reserved virtually and does not occupy physical memory unless actual writes happen. Since the virtual address space is large enough, it is ok to be generous with this parameter (with special consideration on Windows, see note below). For best performance, set this to the maximum expected size of the compressed data (see example in
SChunk.__init__
). The size is in bytes.Default: 1 GiB.
Note
On Windows, the size of the mapping is directly coupled to the file size. When the schunk gets destroyed, the file size will be truncated to the actual size of the schunk.
- cparams: dict
A dictionary with the compression parameters, which are the same that can be used in the
compress2()
function. Typesize and blocksize cannot be changed.- dparams: dict
A dictionary with the decompression parameters, which are the same that can be used in the
decompress2()
function.
Notes
This is just a ‘logical’ open, so there is not a close() counterpart because currently there is no need for it.
In case
urlpath
is a URLPath class instance,mode
must be ‘r’,offset
must be 0, and kwargs cannot be passed.In case the original object saved in
urlpath
was a Proxy, this function will only return a Proxy if its source is a local SChunk, NDArray or a remote C2Array. Otherwise, it will return the Python-Blosc2 container used to cache the data which can be a SChunk or a NDArray and may not have all the data initialized (e.g. if the user has not accessed it yet).When opening a LazyExpr keep in mind the later note regarding the operands.
- Returns:
out – The SChunk or NDArray (in case there is a “b2nd” metalayer”) or the C2Array if
urlpath
is a blosc2.URLPath instance.- Return type:
Examples
>>> import blosc2 >>> import numpy as np >>> storage = {"contiguous": True, "urlpath": getfixture('tmp_path') / "b2frame", "mode": "w"} >>> nelem = 20 * 1000 >>> nchunks = 5 >>> chunksize = nelem * 4 // nchunks >>> data = np.arange(nelem, dtype="int32") >>> # Create SChunk and append data >>> schunk = blosc2.SChunk(chunksize=chunksize, data=data.tobytes(), **storage) >>> # Open SChunk >>> sc_open = blosc2.open(urlpath=storage["urlpath"]) >>> for i in range(nchunks): ... dest = np.empty(nelem // nchunks, dtype=data.dtype) ... schunk.decompress_chunk(i, dest) ... dest1 = np.empty(nelem // nchunks, dtype=data.dtype) ... sc_open.decompress_chunk(i, dest1) ... np.array_equal(dest, dest1) True True True True True
To open the same schunk memory-mapped, we simply need to pass the mmap_mode parameter:
>>> sc_open_mmap = blosc2.open(urlpath=storage["urlpath"], mmap_mode="r") >>> sc_open.nchunks == sc_open_mmap.nchunks True >>> all(sc_open.decompress_chunk(i, dest1) == sc_open_mmap.decompress_chunk(i, dest1) for i in range(nchunks)) True