Mounts & Portability¶
Consist stores portable URIs instead of absolute filesystem paths so runs can move between machines without breaking lineage. This page explains how mounts, workspace URIs, and historical path resolution work.
Recommended path
For most workflow code, the recommended path is consist.run(...),
consist.trace(...), or consist.scenario(...). This page includes
tracker.start_run(...)/consist.log_artifact(...) snippets as low-level
examples for path and URI mechanics.
Mounts at a glance¶
Mounts map a short scheme name to a real path on disk:
from consist import Tracker
tracker = Tracker(
run_dir="./runs",
db_path="./provenance.duckdb",
mounts={
"inputs": "/shared/inputs",
"scratch": "/scratch/users/MY_USERNAME",
},
)
When you log a path under a mount, Consist stores a URI such as:
This keeps provenance portable and lets each user remap mounts on their machine.
Example: Shared Research Team¶
Suppose your team shares a simulation project across machines with different filesystem layouts. Agree on mount names and let each person map them locally.
# Shared setup (agreed upon by the team)
tracker = Tracker(
run_dir="./runs",
db_path="./provenance.duckdb",
mounts={
"inputs": "/shared/data/activitysim_inputs", # Shared NFS mount
"outputs": "/local/activitysim_outputs", # Local SSD for speed
"scratch": "/scratch/users/YOUR_USERNAME", # Temporary workspace
},
)
On each team member's machine, the paths differ but mount names stay the same:
# Alice's setup
tracker = Tracker(
run_dir="./runs",
db_path="./provenance.duckdb",
mounts={
"inputs": "/mnt/nfs/activitysim_inputs",
"outputs": "/home/alice/activitysim_outputs",
"scratch": "/scratch/alice",
},
)
# Bob's setup
tracker = Tracker(
run_dir="./runs",
db_path="./provenance.duckdb",
mounts={
"inputs": "/data/nfs/inputs",
"outputs": "/var/cache/bob/outputs",
"scratch": "/tmp/bob_scratch",
},
)
When Alice logs an output (inside a run context):
with tracker.start_run("asim_baseline", model="activitysim"):
consist.log_artifact(
Path("/home/alice/activitysim_outputs/results.parquet"),
key="results",
direction="output",
)
Consist detects the mount and stores a portable URI:
When Bob retrieves this artifact, Consist resolves it using his mount configuration:
This is URI portability, not automatic byte replication. Bob only gets a
usable local file if his outputs mount points at the same underlying dataset,
or if the artifact can be rehydrated another way.
Container volumes aligned with mounts¶
When using run_container(...), map container host volume roots from
Tracker(mounts=...) so paths remain portable across machines.
from pathlib import Path
from consist import Tracker
tracker = Tracker(
run_dir="/shared/team_scratch/consist_runs",
db_path="./provenance.duckdb",
mounts={
"inputs": "/shared/team_inputs",
"runs": "/shared/team_scratch/consist_runs",
},
)
inputs_root = Path(tracker.mounts["inputs"]).resolve()
runs_root = Path(tracker.mounts["runs"]).resolve()
volumes = {
str(inputs_root): "/inputs",
str(runs_root): "/outputs",
}
outputs = [runs_root / "beam_step" / "summary.csv"] # under tracker.run_dir
Guidelines:
- Keep container paths stable (
/inputs,/outputs) across environments. - Keep host paths machine-specific via tracker mounts.
- Keep host volume roots stable too if you want cross-machine cache reuse.
- Keep
strict_mounts=Trueunless you intentionally allow external paths.
Current caveat: container cache identity includes the resolved host volumes
mapping, not just the in-container mount points. If one machine uses
/shared/team_inputs and another uses /mnt/nfs/team_inputs, those runs will
not currently share the same container cache signature.
For a complete runnable example using this mapping pattern, see Container Integration Guide.
Run-local outputs¶
Paths under the run directory are usually stored relative to the active run:
Current runs typically store these as ./... paths. Historical resolution also
accepts workspace://... as an alias and uses the run's _physical_run_dir
metadata field, which records the absolute run directory at execution time.
| Scenario | Behavior |
|---|---|
| Current run directory matches original | Files accessible |
| Run directory moved | Metadata-only cache hits work; file access fails |
| Run directory deleted | Metadata-only cache hits work; file access fails |
_physical_run_dir
Stored in run.meta["_physical_run_dir"]. Used for historical path resolution when hydrating artifacts from prior runs.
Historical path resolution¶
When Consist needs bytes from a historical run (e.g., cache hydration or
inputs-missing), it resolves paths in this order:
1) If the URI uses workspace:// or ./, resolve relative to the original run's
_physical_run_dir.
2) If the URI uses a mount scheme (e.g., inputs://), resolve using the current
tracker mounts.
3) Otherwise, treat the URI as an absolute path.
If a mount is missing or points somewhere else, materialization will warn and skip missing files rather than crashing (unless explicitly set to raise).
For the newer tracker.materialize_run_outputs(...) recovery path, output layout
resolution is more conservative and history-aware:
1) workspace://... and ./... are re-rooted under the producing run's
_physical_run_dir.
2) mount-backed output URIs such as outputs://... prefer the historical mount
snapshot stored in run.meta["mounts"].
3) if that snapshot is unavailable, Consist falls back to
artifact.meta["mount_root"] when present.
4) absolute paths and file://... URIs are treated as unmapped rather than
being silently reinterpreted.
This is what allows run-scoped output recovery to preserve historical relative layout even when current mounts differ from the machine that produced the run.
Sharing a database across machines¶
Sharing a DuckDB provenance file across a team is supported, but you must keep mounts consistent in intent even if the physical paths differ.
Recommended practice:
- Agree on mount names (inputs, outputs, scratch, shared).
- Each user maps those names to their local filesystem.
- Store the DB in a shared location with write access controls.
If a user hits a cache hit but cannot access the source filesystem, Consist will log a warning and proceed without materializing the files.
Hydration implications for container workflows¶
Container runs and function runs differ on cache-hit file behavior:
run_container(...): cache hits copy cached outputs to requested host output paths (materialized bytes expected on disk).consist.run(...)/tracker.run(...): default cache hits hydrate metadata only (cache_hydration="metadata"), and bytes are loaded/copied on demand.
Portability implications:
- If mounts are mapped correctly, container cache-hit materialization can succeed on each machine's local host paths.
- If mounts are missing/misaligned, cache metadata can still exist but output file materialization may warn/skip.
- If host volume roots differ across machines, container cache reuse may be missed entirely because those host paths are part of the container signature.
See Container Integration Guide for container cache details and Caching & Hydration for non-container run policies.
Best practices¶
- Prefer mounts for shared data directories; avoid absolute paths in artifacts.
- Keep run directories local and disposable; treat cached outputs as rehydratable.
- Distinguish portable URIs from portable bytes: a remapped URI still needs an accessible underlying file or a recovery path.
- Use
cache_hydration="outputs-requested"for only the outputs you need. - Use
cache_hydration="inputs-missing"to backfill inputs when a run moves across machines or directories. - For archive-mirror cache hydration, use
cache_options=CacheOptions(materialize_cached_outputs_source_root=...)onrun(...)/ scenario steps, or passmaterialize_cached_outputs_source_root=...on low-leveltracker.start_run(...)/tracker.begin_run(...)workflows. - Use
tracker.materialize_run_outputs(..., source_root=...)when you need to restore historical outputs from an archive mirror into a new root. tracker.materialize_run_outputs(...)acceptstarget_rootunder either the trackerrun_diror a configured mount root. Other destinations still requireallow_external_paths=True.
Troubleshooting¶
- Missing file on cache hit: Check that mounts map to the correct root and the original run directory still exists for workspace URIs.
- Moved run directory: Cache metadata is still valid, but byte materialization
will warn because
_physical_run_dirno longer points to the original location. - Permission denied: Consist warns and continues; adjust mount permissions or use a shared accessible path for cached outputs you need to materialize.