Skip to content

Mounts & Portability

Consist stores portable URIs instead of absolute filesystem paths so runs can move between machines without breaking lineage. This page explains how mounts, workspace URIs, and historical path resolution work.

Recommended path

For most workflow code, the recommended path is consist.run(...), consist.trace(...), or consist.scenario(...). This page includes tracker.start_run(...)/consist.log_artifact(...) snippets as low-level examples for path and URI mechanics.


Mounts at a glance

Mounts map a short scheme name to a real path on disk:

from consist import Tracker

tracker = Tracker(
    run_dir="./runs",
    db_path="./provenance.duckdb",
    mounts={
        "inputs": "/shared/inputs",
        "scratch": "/scratch/users/MY_USERNAME",
    },
)

When you log a path under a mount, Consist stores a URI such as:

inputs://land_use.csv
scratch://temp/output.parquet

This keeps provenance portable and lets each user remap mounts on their machine.


Example: Shared Research Team

Suppose your team shares a simulation project across machines with different filesystem layouts. Agree on mount names and let each person map them locally.

# Shared setup (agreed upon by the team)
tracker = Tracker(
    run_dir="./runs",
    db_path="./provenance.duckdb",
    mounts={
        "inputs": "/shared/data/activitysim_inputs",      # Shared NFS mount
        "outputs": "/local/activitysim_outputs",          # Local SSD for speed
        "scratch": "/scratch/users/YOUR_USERNAME",        # Temporary workspace
    },
)

On each team member's machine, the paths differ but mount names stay the same:

# Alice's setup
tracker = Tracker(
    run_dir="./runs",
    db_path="./provenance.duckdb",
    mounts={
        "inputs": "/mnt/nfs/activitysim_inputs",
        "outputs": "/home/alice/activitysim_outputs",
        "scratch": "/scratch/alice",
    },
)

# Bob's setup
tracker = Tracker(
    run_dir="./runs",
    db_path="./provenance.duckdb",
    mounts={
        "inputs": "/data/nfs/inputs",
        "outputs": "/var/cache/bob/outputs",
        "scratch": "/tmp/bob_scratch",
    },
)

When Alice logs an output (inside a run context):

with tracker.start_run("asim_baseline", model="activitysim"):
    consist.log_artifact(
        Path("/home/alice/activitysim_outputs/results.parquet"),
        key="results",
        direction="output",
    )

Consist detects the mount and stores a portable URI:

outputs://results.parquet

When Bob retrieves this artifact, Consist resolves it using his mount configuration:

outputs:// → /var/cache/bob/outputs/ → /var/cache/bob/outputs/results.parquet

This is URI portability, not automatic byte replication. Bob only gets a usable local file if his outputs mount points at the same underlying dataset, or if the artifact can be rehydrated another way.


Container volumes aligned with mounts

When using run_container(...), map container host volume roots from Tracker(mounts=...) so paths remain portable across machines.

from pathlib import Path
from consist import Tracker

tracker = Tracker(
    run_dir="/shared/team_scratch/consist_runs",
    db_path="./provenance.duckdb",
    mounts={
        "inputs": "/shared/team_inputs",
        "runs": "/shared/team_scratch/consist_runs",
    },
)

inputs_root = Path(tracker.mounts["inputs"]).resolve()
runs_root = Path(tracker.mounts["runs"]).resolve()

volumes = {
    str(inputs_root): "/inputs",
    str(runs_root): "/outputs",
}
outputs = [runs_root / "beam_step" / "summary.csv"]  # under tracker.run_dir

Guidelines:

  • Keep container paths stable (/inputs, /outputs) across environments.
  • Keep host paths machine-specific via tracker mounts.
  • Keep host volume roots stable too if you want cross-machine cache reuse.
  • Keep strict_mounts=True unless you intentionally allow external paths.

Current caveat: container cache identity includes the resolved host volumes mapping, not just the in-container mount points. If one machine uses /shared/team_inputs and another uses /mnt/nfs/team_inputs, those runs will not currently share the same container cache signature.

For a complete runnable example using this mapping pattern, see Container Integration Guide.


Run-local outputs

Paths under the run directory are usually stored relative to the active run:

./outputs/<run_id>/model.csv

Current runs typically store these as ./... paths. Historical resolution also accepts workspace://... as an alias and uses the run's _physical_run_dir metadata field, which records the absolute run directory at execution time.

Scenario Behavior
Current run directory matches original Files accessible
Run directory moved Metadata-only cache hits work; file access fails
Run directory deleted Metadata-only cache hits work; file access fails

_physical_run_dir

Stored in run.meta["_physical_run_dir"]. Used for historical path resolution when hydrating artifacts from prior runs.


Historical path resolution

When Consist needs bytes from a historical run (e.g., cache hydration or inputs-missing), it resolves paths in this order:

1) If the URI uses workspace:// or ./, resolve relative to the original run's _physical_run_dir. 2) If the URI uses a mount scheme (e.g., inputs://), resolve using the current tracker mounts. 3) Otherwise, treat the URI as an absolute path.

If a mount is missing or points somewhere else, materialization will warn and skip missing files rather than crashing (unless explicitly set to raise).

For the newer tracker.materialize_run_outputs(...) recovery path, output layout resolution is more conservative and history-aware:

1) workspace://... and ./... are re-rooted under the producing run's _physical_run_dir. 2) mount-backed output URIs such as outputs://... prefer the historical mount snapshot stored in run.meta["mounts"]. 3) if that snapshot is unavailable, Consist falls back to artifact.meta["mount_root"] when present. 4) absolute paths and file://... URIs are treated as unmapped rather than being silently reinterpreted.

This is what allows run-scoped output recovery to preserve historical relative layout even when current mounts differ from the machine that produced the run.


Sharing a database across machines

Sharing a DuckDB provenance file across a team is supported, but you must keep mounts consistent in intent even if the physical paths differ.

Recommended practice: - Agree on mount names (inputs, outputs, scratch, shared). - Each user maps those names to their local filesystem. - Store the DB in a shared location with write access controls.

If a user hits a cache hit but cannot access the source filesystem, Consist will log a warning and proceed without materializing the files.


Hydration implications for container workflows

Container runs and function runs differ on cache-hit file behavior:

  • run_container(...): cache hits copy cached outputs to requested host output paths (materialized bytes expected on disk).
  • consist.run(...)/tracker.run(...): default cache hits hydrate metadata only (cache_hydration="metadata"), and bytes are loaded/copied on demand.

Portability implications:

  • If mounts are mapped correctly, container cache-hit materialization can succeed on each machine's local host paths.
  • If mounts are missing/misaligned, cache metadata can still exist but output file materialization may warn/skip.
  • If host volume roots differ across machines, container cache reuse may be missed entirely because those host paths are part of the container signature.

See Container Integration Guide for container cache details and Caching & Hydration for non-container run policies.


Best practices

  • Prefer mounts for shared data directories; avoid absolute paths in artifacts.
  • Keep run directories local and disposable; treat cached outputs as rehydratable.
  • Distinguish portable URIs from portable bytes: a remapped URI still needs an accessible underlying file or a recovery path.
  • Use cache_hydration="outputs-requested" for only the outputs you need.
  • Use cache_hydration="inputs-missing" to backfill inputs when a run moves across machines or directories.
  • For archive-mirror cache hydration, use cache_options=CacheOptions(materialize_cached_outputs_source_root=...) on run(...) / scenario steps, or pass materialize_cached_outputs_source_root=... on low-level tracker.start_run(...) / tracker.begin_run(...) workflows.
  • Use tracker.materialize_run_outputs(..., source_root=...) when you need to restore historical outputs from an archive mirror into a new root.
  • tracker.materialize_run_outputs(...) accepts target_root under either the tracker run_dir or a configured mount root. Other destinations still require allow_external_paths=True.

Troubleshooting

  • Missing file on cache hit: Check that mounts map to the correct root and the original run directory still exists for workspace URIs.
  • Moved run directory: Cache metadata is still valid, but byte materialization will warn because _physical_run_dir no longer points to the original location.
  • Permission denied: Consist warns and continues; adjust mount permissions or use a shared accessible path for cached outputs you need to materialize.

See also