Containers¶
This page is the API surface for container execution. For architecture, portability patterns, and practical examples, use:
Portable volume mapping quick example¶
Use tracker mount roots as host-side volumes and keep stable container paths:
from pathlib import Path
from consist import Tracker
from consist.integrations.containers import run_container
tracker = Tracker(
run_dir="/scratch/consist_runs",
db_path="./provenance.duckdb",
mounts={"inputs": "/shared/inputs", "runs": "/scratch/consist_runs"},
)
run_container(
tracker=tracker,
run_id="container_step",
image="my-org/tool:v1",
command=["python", "tool.py", "--in", "/inputs", "--out", "/outputs"],
volumes={
str(Path(tracker.mounts["inputs"]).resolve()): "/inputs",
str(Path(tracker.mounts["runs"]).resolve()): "/outputs",
},
inputs=[Path(tracker.mounts["inputs"]) / "data.csv"],
outputs=[Path(tracker.mounts["runs"]) / "container_step" / "result.csv"],
)
Mount validation and defaults
run_container(...) validates host volume paths against
Tracker(mounts=...) when strict_mounts=True (default).
On cache hits, run_container(...) materializes requested output paths on the
host (copy-only) and skips container execution.
For an end-to-end example and portability guidance, see Container Integration Guide.
Consist Container API Module¶
This module provides a high-level API for executing containerized steps
(e.g., Docker, Singularity/Apptainer) with automatic provenance tracking
and caching through Consist. It abstracts away the complexities of
interacting directly with container runtimes and integrates seamlessly
with Consist's Tracker to log container execution details, input
dependencies, and output artifacts.
Key functionalities include:
- Container Execution with Provenance: Wraps container execution
within a Consist
start_runcontext, ensuring that container image identity, commands, environment hash, and file I/O are tracked. - Backend Agnosticism: Supports different container runtimes (Docker, Singularity/Apptainer) via a unified interface.
- Automated Input/Output Logging: Automatically logs host-side files as inputs and scans specified paths for outputs, linking them to the container run.
ContainerResult
dataclass
¶
Return value for run_container with cached output artifacts.
output
property
¶
Convenience: return the first (or only) output artifact if present.
run_container(tracker, run_id, image, command, volumes, inputs, outputs, environment=None, working_dir=None, backend_type='docker', pull_latest=False, lineage_mode='full', strict_mounts=True)
¶
Executes a containerized step with optional provenance tracking and caching via Consist.
This function acts as a high-level wrapper that integrates container execution
with Consist's Tracker. In lineage mode "full" it initiates a Consist run
(or attaches to an active run), uses the container's image and command as part
of the run's identity (code/config), and tracks host-side files as inputs and
outputs. In lineage mode "none" it only executes the container and returns a
stable manifest/hash for callers to incorporate into an enclosing step's identity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tracker
|
Tracker
|
The active Consist |
required |
run_id
|
str
|
A unique identifier for this container execution run within Consist. |
required |
image
|
str
|
The container image to use (e.g., "ubuntu:latest", "my_repo/my_image:tag"). |
required |
command
|
Union[str, List[str]]
|
The command to execute inside the container. Can be a string or a list of strings (for exec form). Commands are validated for non-empty tokens and a maximum length. |
required |
volumes
|
Dict[str, str]
|
A dictionary mapping host paths to container paths for volume mounts.
Example: |
required |
inputs
|
List[ArtifactRef]
|
A list of paths (str/Path) or |
required |
outputs
|
List[str]
|
A list of paths on the host machine that are expected to be generated or modified by the containerized process. These paths will be scanned and logged as Consist output artifacts. Host paths are validated against tracker mounts and must remain within run_dir unless allow_external_paths is enabled. |
required |
outputs
|
Dict[str, str]
|
Alternatively, pass a mapping of logical output keys to host paths. The artifact will be logged with the provided key instead of the filename. Host paths are validated against tracker mounts and must remain within run_dir unless allow_external_paths is enabled. |
required |
environment
|
Optional[Dict[str, str]]
|
A dictionary of environment variables to set inside the container. Defaults to empty. |
None
|
working_dir
|
Optional[str]
|
The working directory inside the container where the command will be executed. If None, the default working directory of the container image will be used. |
None
|
backend_type
|
str
|
The container runtime backend to use. Currently supports "docker" and "singularity". |
"docker"
|
pull_latest
|
bool
|
If True, the Docker backend will attempt to pull the latest image before execution. (Applicable only for 'docker' backend). |
False
|
lineage_mode
|
Literal['full', 'none']
|
"full" performs Consist provenance tracking, caching, and output scanning. "none" skips Consist logging/caching and does not scan outputs. |
"full"
|
strict_mounts
|
bool
|
If True, require tracker mounts to be configured and constrain container host paths to those roots. Set to False to allow any absolute host path. |
True
|
Returns:
| Type | Description |
|---|---|
ContainerResult
|
Structured result containing logged output artifacts and cache metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an unknown |
RuntimeError
|
If the container execution itself fails (e.g., non-zero exit code). If the underlying backend fails to resolve image digest or run the container. |
Bases: BaseModel
Represents the 'Configuration' of a container run for hashing purposes.
This model captures all relevant parameters that define a containerized execution, allowing Consist to compute a canonical hash for the container's configuration. This hash is critical for determining cache hits and ensuring reproducibility.
Attributes:
| Name | Type | Description |
|---|---|---|
image |
str
|
The name or reference of the container image (e.g., "ubuntu:latest"). |
image_digest |
Optional[str]
|
A content-addressable SHA digest of the container image, used for precise reproducibility. If None, the image tag is used. |
command |
List[str]
|
The command and its arguments to execute inside the container, represented as a list of strings (exec form). |
environment |
Dict[str, str]
|
A dictionary of environment variables passed to the container. Values are not persisted in run metadata; only a deterministic hash is stored for caching. |
backend |
str
|
The container backend used to execute this container (e.g., "docker", "singularity"). |
extra_args |
Dict[str, Any]
|
Additional arguments or configuration specific to the container backend that might influence the execution but are not part of the core identity (e.g., resource limits, specific volume options). |
backend
instance-attribute
¶
command
instance-attribute
¶
declared_outputs = None
class-attribute
instance-attribute
¶
environment
instance-attribute
¶
extra_args = {}
class-attribute
instance-attribute
¶
image
instance-attribute
¶
image_digest = None
class-attribute
instance-attribute
¶
volumes = {}
class-attribute
instance-attribute
¶
working_dir = None
class-attribute
instance-attribute
¶
to_hashable_config()
¶
Returns a clean dictionary representation of the container configuration suitable for hashing.
This method generates a dictionary that excludes None values, ensuring a
canonical representation of the configuration for consistent hash computation.
This is crucial for Consist's caching mechanism.
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
A dictionary containing the essential configuration parameters of the
container, stripped of any |