Skip to content

Containers

This page is the API surface for container execution. For architecture, portability patterns, and practical examples, use:

Portable volume mapping quick example

Use tracker mount roots as host-side volumes and keep stable container paths:

from pathlib import Path
from consist import Tracker
from consist.integrations.containers import run_container

tracker = Tracker(
    run_dir="/scratch/consist_runs",
    db_path="./provenance.duckdb",
    mounts={"inputs": "/shared/inputs", "runs": "/scratch/consist_runs"},
)

run_container(
    tracker=tracker,
    run_id="container_step",
    image="my-org/tool:v1",
    command=["python", "tool.py", "--in", "/inputs", "--out", "/outputs"],
    volumes={
        str(Path(tracker.mounts["inputs"]).resolve()): "/inputs",
        str(Path(tracker.mounts["runs"]).resolve()): "/outputs",
    },
    inputs=[Path(tracker.mounts["inputs"]) / "data.csv"],
    outputs=[Path(tracker.mounts["runs"]) / "container_step" / "result.csv"],
)

Mount validation and defaults

run_container(...) validates host volume paths against Tracker(mounts=...) when strict_mounts=True (default).

On cache hits, run_container(...) materializes requested output paths on the host (copy-only) and skips container execution.

For an end-to-end example and portability guidance, see Container Integration Guide.

Consist Container API Module

This module provides a high-level API for executing containerized steps (e.g., Docker, Singularity/Apptainer) with automatic provenance tracking and caching through Consist. It abstracts away the complexities of interacting directly with container runtimes and integrates seamlessly with Consist's Tracker to log container execution details, input dependencies, and output artifacts.

Key functionalities include:

  • Container Execution with Provenance: Wraps container execution within a Consist start_run context, ensuring that container image identity, commands, environment hash, and file I/O are tracked.
  • Backend Agnosticism: Supports different container runtimes (Docker, Singularity/Apptainer) via a unified interface.
  • Automated Input/Output Logging: Automatically logs host-side files as inputs and scans specified paths for outputs, linking them to the container run.

ContainerResult dataclass

Return value for run_container with cached output artifacts.

output property

Convenience: return the first (or only) output artifact if present.

run_container(tracker, run_id, image, command, volumes, inputs, outputs, environment=None, working_dir=None, backend_type='docker', pull_latest=False, lineage_mode='full', strict_mounts=True)

Executes a containerized step with optional provenance tracking and caching via Consist.

This function acts as a high-level wrapper that integrates container execution with Consist's Tracker. In lineage mode "full" it initiates a Consist run (or attaches to an active run), uses the container's image and command as part of the run's identity (code/config), and tracks host-side files as inputs and outputs. In lineage mode "none" it only executes the container and returns a stable manifest/hash for callers to incorporate into an enclosing step's identity.

Parameters:

Name Type Description Default
tracker Tracker

The active Consist Tracker instance to use for provenance logging.

required
run_id str

A unique identifier for this container execution run within Consist.

required
image str

The container image to use (e.g., "ubuntu:latest", "my_repo/my_image:tag").

required
command Union[str, List[str]]

The command to execute inside the container. Can be a string or a list of strings (for exec form). Commands are validated for non-empty tokens and a maximum length.

required
volumes Dict[str, str]

A dictionary mapping host paths to container paths for volume mounts. Example: {"/host/path": "/container/path"}. Host paths are resolved and validated against tracker mounts. When strict_mounts is False, any absolute host path is permitted. Relative paths are resolved against the first mount root.

required
inputs List[ArtifactRef]

A list of paths (str/Path) or Artifact objects on the host machine that serve as inputs to the containerized process. These are logged as Consist inputs.

required
outputs List[str]

A list of paths on the host machine that are expected to be generated or modified by the containerized process. These paths will be scanned and logged as Consist output artifacts. Host paths are validated against tracker mounts and must remain within run_dir unless allow_external_paths is enabled.

required
outputs Dict[str, str]

Alternatively, pass a mapping of logical output keys to host paths. The artifact will be logged with the provided key instead of the filename. Host paths are validated against tracker mounts and must remain within run_dir unless allow_external_paths is enabled.

required
environment Optional[Dict[str, str]]

A dictionary of environment variables to set inside the container. Defaults to empty.

None
working_dir Optional[str]

The working directory inside the container where the command will be executed. If None, the default working directory of the container image will be used.

None
backend_type str

The container runtime backend to use. Currently supports "docker" and "singularity".

"docker"
pull_latest bool

If True, the Docker backend will attempt to pull the latest image before execution. (Applicable only for 'docker' backend).

False
lineage_mode Literal['full', 'none']

"full" performs Consist provenance tracking, caching, and output scanning. "none" skips Consist logging/caching and does not scan outputs.

"full"
strict_mounts bool

If True, require tracker mounts to be configured and constrain container host paths to those roots. Set to False to allow any absolute host path.

True

Returns:

Type Description
ContainerResult

Structured result containing logged output artifacts and cache metadata.

Raises:

Type Description
ValueError

If an unknown backend_type is specified.

RuntimeError

If the container execution itself fails (e.g., non-zero exit code). If the underlying backend fails to resolve image digest or run the container.

Bases: BaseModel

Represents the 'Configuration' of a container run for hashing purposes.

This model captures all relevant parameters that define a containerized execution, allowing Consist to compute a canonical hash for the container's configuration. This hash is critical for determining cache hits and ensuring reproducibility.

Attributes:

Name Type Description
image str

The name or reference of the container image (e.g., "ubuntu:latest").

image_digest Optional[str]

A content-addressable SHA digest of the container image, used for precise reproducibility. If None, the image tag is used.

command List[str]

The command and its arguments to execute inside the container, represented as a list of strings (exec form).

environment Dict[str, str]

A dictionary of environment variables passed to the container. Values are not persisted in run metadata; only a deterministic hash is stored for caching.

backend str

The container backend used to execute this container (e.g., "docker", "singularity").

extra_args Dict[str, Any]

Additional arguments or configuration specific to the container backend that might influence the execution but are not part of the core identity (e.g., resource limits, specific volume options).

backend instance-attribute

command instance-attribute

declared_outputs = None class-attribute instance-attribute

environment instance-attribute

extra_args = {} class-attribute instance-attribute

image instance-attribute

image_digest = None class-attribute instance-attribute

volumes = {} class-attribute instance-attribute

working_dir = None class-attribute instance-attribute

to_hashable_config()

Returns a clean dictionary representation of the container configuration suitable for hashing.

This method generates a dictionary that excludes None values, ensuring a canonical representation of the configuration for consistent hash computation. This is crucial for Consist's caching mechanism.

Returns:

Type Description
Dict[str, Any]

A dictionary containing the essential configuration parameters of the container, stripped of any None values, ready for hashing.