DataFileChunk
- class openmsistream.data_file_io.entity.data_file_chunk.DataFileChunk(filepath, filename, file_hash, chunk_hash, chunk_offset_read, chunk_offset_write, chunk_size, chunk_i, n_total_chunks, rootdir=None, filename_append='', data=None)
Bases:
Producible
Class representing a single chunk of an uploaded or downloaded file. DataFileChunk objects are automatically serialized/deserialized when they are produced/consumed to topics using OpenMSIStream.
- Parameters:
filepath (
pathlib.Path
) – path to this chunk’s file (fully resolved if being produced, may be relative if it was consumed)filename (str) – the name of the file
file_hash (str) – hash of this chunk’s entire file data
chunk_hash (str) – hash of this chunk’s data
chunk_offset_read (int) – offset (in bytes) of this chunk within the original file
chunk_offset_write (int) – offset (in bytes) of this chunk within the reconstructed file (may be different than chunk_offset_read due to excluding some bytes in uploading)
chunk_size (int) – size of this chunk (in bytes)
chunk_i (int) – index of this chunk within the larger file
n_total_chunks (int) – the total number of chunks to expect from the original file
rootdir (
pathlib.Path
, optional) – path to the “root” directory; anything beyond in the filepath is considered a subdirectory (can also be set later)filename_append (str, optional) – string to append to the stem of the filename when the file is reconstructed
data (bytes, optional) – the actual binary data of this chunk of the file (can be set later if this chunk is being produced and not consumed)
- property filepath
The path to the file
- property relative_filepath
The path to the file, relative to its root directory (if it has one)
- property rootdir
The path to the file’s root directory (already set if chunk is to be produced, but must be set later if chunk is a consumed message)
- property subdir_str
A string representation of the path to the file, relative to its root directory
- property msg_key
string representing the key of the message this chunk will be produced as
- property msg_value
value of the message this chunk will be produced as (just the object itself, since a
DataFileChunkSerializer
is used)
- property callback_kwargs
keyword arguments that should be sent to the producer callback function when the chunk is produced
- get_log_msg(print_every=None)
If the chunk’s index mod
print_every
is 0, or if the chunk is the last one for the file, returns a message to log. Otherwise returns None.
- populate_with_file_data(logger=None)
Populate this chunk with the actual data from the file. Called only when this chunk is being produced.
- Parameters:
logger (
OpenMSIToolbox.logging.OpenMSILogger
, optional) – a logger object to use to log errors that may arise in populating the file chunk- Raises:
FileNotFoundError – if the file doesn’t exist on disk at
self.filepath
ValueError – if the data read from the file is not of the expected size, or if its hash is not matched to what was originally calculated.