DataFileChunk

class openmsistream.data_file_io.entity.data_file_chunk.DataFileChunk(filepath, filename, file_hash, chunk_hash, chunk_offset_read, chunk_offset_write, chunk_size, chunk_i, n_total_chunks, rootdir=None, filename_append='', data=None)

Bases: Producible

Class representing a single chunk of an uploaded or downloaded file. DataFileChunk objects are automatically serialized/deserialized when they are produced/consumed to topics using OpenMSIStream.

Parameters:
  • filepath (pathlib.Path) – path to this chunk’s file (fully resolved if being produced, may be relative if it was consumed)

  • filename (str) – the name of the file

  • file_hash (str) – hash of this chunk’s entire file data

  • chunk_hash (str) – hash of this chunk’s data

  • chunk_offset_read (int) – offset (in bytes) of this chunk within the original file

  • chunk_offset_write (int) – offset (in bytes) of this chunk within the reconstructed file (may be different than chunk_offset_read due to excluding some bytes in uploading)

  • chunk_size (int) – size of this chunk (in bytes)

  • chunk_i (int) – index of this chunk within the larger file

  • n_total_chunks (int) – the total number of chunks to expect from the original file

  • rootdir (pathlib.Path, optional) – path to the “root” directory; anything beyond in the filepath is considered a subdirectory (can also be set later)

  • filename_append (str, optional) – string to append to the stem of the filename when the file is reconstructed

  • data (bytes, optional) – the actual binary data of this chunk of the file (can be set later if this chunk is being produced and not consumed)

property filepath

The path to the file

property relative_filepath

The path to the file, relative to its root directory (if it has one)

property rootdir

The path to the file’s root directory (already set if chunk is to be produced, but must be set later if chunk is a consumed message)

property subdir_str

A string representation of the path to the file, relative to its root directory

property msg_key

string representing the key of the message this chunk will be produced as

property msg_value

value of the message this chunk will be produced as (just the object itself, since a DataFileChunkSerializer is used)

property callback_kwargs

keyword arguments that should be sent to the producer callback function when the chunk is produced

get_log_msg(print_every=None)

If the chunk’s index mod print_every is 0, or if the chunk is the last one for the file, returns a message to log. Otherwise returns None.

Parameters:

print_every (int, optional) – number of chunks that should be skipped between logging messages

Returns:

message to log stating which chunk from which file is being uploaded

Return type:

str, or None

populate_with_file_data(logger=None)

Populate this chunk with the actual data from the file. Called only when this chunk is being produced.

Parameters:

logger (OpenMSIToolbox.logging.OpenMSILogger, optional) – a logger object to use to log errors that may arise in populating the file chunk

Raises:
  • FileNotFoundError – if the file doesn’t exist on disk at self.filepath

  • ValueError – if the data read from the file is not of the expected size, or if its hash is not matched to what was originally calculated.