Save and load¶
|
Save an array to a file. |
|
Open a persistent SChunk, NDArray, a remote C2Array or a Proxy |
|
Save a serialized NumPy array to a specified file path. |
|
Load a serialized NumPy array from a file. |
|
Save a serialized PyTorch or TensorFlow tensor or NumPy array to a specified file path. |
|
Load a serialized PyTorch or TensorFlow tensor or NumPy array from a file. |
- blosc2.save(array: NDArray, urlpath: str, contiguous=True, **kwargs: Any) None [source]¶
Save an array to a file.
- Parameters:
Examples
>>> import blosc2 >>> import numpy as np >>> # Create an array >>> array = blosc2.arange(0, 100, dtype=np.int64, shape=(10, 10)) >>> # Save the array to a file >>> blosc2.save(array, "array.b2", mode="w")
- blosc2.open(urlpath: str | Path | URLPath, mode: str = 'a', offset: int = 0, **kwargs: dict) SChunk | NDArray | C2Array | LazyArray | Proxy [source]¶
Open a persistent SChunk, NDArray, a remote C2Array or a Proxy
See the Notes section for more info on opening Proxy objects.
- Parameters:
urlpath¶ (str | pathlib.Path | URLPath class) – The path where the SChunk (or NDArray) is stored. If it is a remote array, a URLPath class must be passed.
mode¶ (str, optional) – Persistence mode: ‘r’ means read only (must exist); ‘a’ means read/write (create if it doesn’t exist); ‘w’ means create (overwrite if it exists). Default is ‘a’.
offset¶ (int, optional) – An offset in the file where super-chunk or array data is located (e.g. in a file containing several such objects).
kwargs¶ (dict, optional) –
- mmap_mode: str, optional
If set, the file will be memory-mapped instead of using the default I/O functions and the mode argument will be ignored. For more info, see
blosc2.Storage
. Please note that the w+ mode, which can be used to create new files, is not supported here since only existing files can be opened. You can useSChunk.__init__
to create new files.- initial_mapping_size: int, optional
The initial size of the memory mapping. For more info, see
blosc2.Storage
.- cparams: dict
A dictionary with the compression parameters, which are the same that can be used in the
compress2()
function. Typesize and blocksize cannot be changed.- dparams: dict
A dictionary with the decompression parameters, which are the same that can be used in the
decompress2()
function.
- Returns:
out – The SChunk or NDArray (if there is a “b2nd” metalayer”) or the C2Array if
urlpath
is a blosc2.URLPath instance.- Return type:
Notes
This is just a ‘logical’ open, so there is no close() counterpart because currently, there is no need for it.
If
urlpath
is a URLPath class instance,mode
must be ‘r’,offset
must be 0, and kwargs cannot be passed.If the original object saved in
urlpath
is a Proxy, this function will only return a Proxy if its source is a local SChunk, NDArray or a remote C2Array. Otherwise, it will return the Python-Blosc2 container used to cache the data which can be a SChunk or a NDArray and may not have all the data initialized (e.g. if the user has not accessed to it yet).When opening a LazyExpr keep in mind the note above regarding operands.
Examples
>>> import blosc2 >>> import numpy as np >>> import os >>> import tempfile >>> tmpdirname = tempfile.mkdtemp() >>> urlpath = os.path.join(tmpdirname, 'b2frame') >>> storage = blosc2.Storage(contiguous=True, urlpath=urlpath, mode="w") >>> nelem = 20 * 1000 >>> nchunks = 5 >>> chunksize = nelem * 4 // nchunks >>> data = np.arange(nelem, dtype="int32") >>> # Create SChunk and append data >>> schunk = blosc2.SChunk(chunksize=chunksize, data=data.tobytes(), storage=storage) >>> # Open SChunk >>> sc_open = blosc2.open(urlpath=urlpath) >>> for i in range(nchunks): ... dest = np.empty(nelem // nchunks, dtype=data.dtype) ... schunk.decompress_chunk(i, dest) ... dest1 = np.empty(nelem // nchunks, dtype=data.dtype) ... sc_open.decompress_chunk(i, dest1) ... np.array_equal(dest, dest1) True True True True True
To open the same schunk memory-mapped, we simply need to pass the mmap_mode parameter:
>>> sc_open_mmap = blosc2.open(urlpath=urlpath, mmap_mode="r") >>> sc_open.nchunks == sc_open_mmap.nchunks True >>> all(sc_open.decompress_chunk(i, dest1) == sc_open_mmap.decompress_chunk(i, dest1) for i in range(nchunks)) True
- blosc2.save_array(arr: ndarray, urlpath: str, chunksize: int | None = None, **kwargs: dict) int [source]¶
Save a serialized NumPy array to a specified file path.
- Parameters:
arr¶ (np.ndarray) – The NumPy array to be saved.
urlpath¶ (str) – The path for the file where the array will be saved.
chunksize¶ (int) – The size (in bytes) for the chunks during compression. If not provided, it is computed automatically.
kwargs¶ (dict, optional) – These are the same as the kwargs in
SChunk.__init__
.
- Returns:
out – The number of bytes of the saved array.
- Return type:
int
Examples
>>> import numpy as np >>> a = np.arange(1e6) >>> serial_size = blosc2.save_array(a, "test.bl2", mode="w") >>> serial_size < a.size * a.itemsize True
See also
- blosc2.load_array(urlpath: str, dparams: dict | None = None) ndarray [source]¶
Load a serialized NumPy array from a file.
- Parameters:
urlpath¶ (str) – The path to the file containing the serialized array.
dparams¶ (dict, optional) – A dictionary with the decompression parameters, which can be used in the
decompress2()
function.
- Returns:
out – The deserialized NumPy array.
- Return type:
np.ndarray
- Raises:
TypeError – If
urlpath
is not in cframe formatRunTimeError – If any other error is detected.
Examples
>>> import numpy as np >>> a = np.arange(1e6) >>> serial_size = blosc2.save_array(a, "test.bl2", mode="w") >>> serial_size < a.size * a.itemsize True >>> a2 = blosc2.load_array("test.bl2") >>> np.array_equal(a, a2) True
See also
- blosc2.save_tensor(tensor: tensorflow.Tensor | torch.Tensor | np.ndarray, urlpath: str, chunksize: int | None = None, **kwargs: dict) int [source]¶
Save a serialized PyTorch or TensorFlow tensor or NumPy array to a specified file path.
- Parameters:
tensor¶ (tensorflow.Tensor, torch.Tensor, or np.ndarray) – The tensor or array to be saved.
urlpath¶ (str) – The file path where the tensor or array will be saved.
chunksize¶ (int) – The size (in bytes) for the chunks during compression. If not provided, it is computed automatically.
kwargs¶ (dict, optional) – These are the same as the kwargs in
SChunk.__init__
.
- Returns:
out – The number of bytes of the saved tensor or array.
- Return type:
int
Examples
>>> import numpy as np >>> th = np.arange(1e6, dtype=np.float32) >>> serial_size = blosc2.save_tensor(th, "test.bl2", mode="w") >>> if not os.getenv("BTUNE_TRADEOFF"): ... assert serial_size < th.size * th.itemsize ...
See also
- blosc2.load_tensor(urlpath: str, dparams: dict | None = None) tensorflow.Tensor | torch.Tensor | np.ndarray [source]¶
Load a serialized PyTorch or TensorFlow tensor or NumPy array from a file.
- Parameters:
urlpath¶ (str) – The path to the file where the tensor or array is stored.
dparams¶ (dict, optional) – A dictionary with the decompression parameters, which are the same as those used in the
decompress2()
function.
- Returns:
out – The unpacked PyTorch or TensorFlow tensor or NumPy array.
- Return type:
tensor or ndarray
- Raises:
TypeError – If
urlpath
is not in cframe formatRunTimeError – If some other problem is detected.
Examples
>>> import numpy as np >>> th = np.arange(1e6, dtype=np.float32) >>> size = blosc2.save_tensor(th, "test.bl2", mode="w") >>> if not os.getenv("BTUNE_TRADEOFF"): ... assert size < th.size * th.itemsize ... >>> th2 = blosc2.load_tensor("test.bl2") >>> np.array_equal(th, th2) True
See also