Installing OpenMSIStream

Official Docker image

The quickest way to deploy OpenMSIStream programs is to use the official public Docker image: openmsi/openmsistream. Version tags there are synchronized with published releases, and “latest” is regularly updated.

The image is built off of the python:3.9-slim-bullseye (Debian Linux) base image, and contains a complete install of OpenMSIStream. Running the Docker image as-is will drop you into a bash terminal as the “openmsi” user (who has sudo privileges) in their home area. By default, the timezone is set to “America/New York” but you can change this by setting the value of the “TZ” environment variable inside the container.

If you want to install OpenMSIStream on your own system instead of running a Docker container, though, we recommend using a minimal installation of the conda open source package and environment management system. The instructions below start with installation of conda and outline all the necessary steps to run OpenMSIStream programs.

Quick start with miniconda3

We recommend using miniconda3 for the lightest installation. miniconda3 installers can be downloaded from the website here, and installation instructions can be found here.

Finishing installation

The pages below list specific installation instructions based on the operating system you’re running:

External requirements

Working with OpenMSIStream requires sending data through topics served by a broker. In practice that means you will need access to a Kafka broker running on a server or in the cloud, and you will need to create and manage topics on the broker to hold the data streams. If these concepts are new to you we suggest contacting us for assistance and/or using a simple, managed cloud solution, such as Confluent Cloud, as your broker.

Consuming data files for transfer to S3 buckets requires that users have the API keys and other information necessary to authenticate and write files to at least one external S3 bucket. Please see the page on running the S3TransferStreamProcessor program for more information. First-time users may find it easiest to use a bucket hosted on AWS. Storing data in S3 bucket object stores is completely optional.

For more information on the full set of requirements for running the automatic code CI tests (which test all functionality of the package), please see the page on CI testing.