Troubleshooting Guide¶

This guide organizes issues by symptom. For concept definitions, see Core Concepts. For topic-specific guides, see:

Recommended path

For normal workflow code, the recommended path is consist.run(...), consist.trace(...), or consist.scenario(...). Some troubleshooting sections intentionally use low-level lifecycle APIs (for example tracker.start_run(...) and manual materialization helpers) to isolate specific failure modes.

Database Maintenance Runbook (Snapshot -> Diagnose -> Action)¶

When troubleshooting provenance DB health, prefer this safe sequence:

Snapshot first (rollback safety):

consist db snapshot --out ./snapshots/provenance.pre-maintenance.duckdb --db-path ./provenance.duckdb

Diagnose before mutating:

consist db inspect --db-path ./provenance.duckdb
consist db doctor --db-path ./provenance.duckdb

Take one action at a time (preview first):

Purge with preview:

consist db purge RUN_ID --dry-run --db-path ./provenance.duckdb
consist db purge RUN_ID --delete-ingested-data --yes --db-path ./provenance.duckdb

Optional unscoped cache pruning:
```
consist db purge RUN_ID --delete-ingested-data --prune-cache --yes --db-path ./provenance.duckdb
```
--prune-cache behavior: - only applies when --delete-ingested-data is enabled - only applies when references are derivable (for example, run_link tables with run_id + content_hash) - assumes content_hash has equivalent semantics across derivable run_link and unscoped_cache tables - becomes a skip/no-op when references are not derivable
Merge shards with explicit conflict policy:
```
consist db merge shard.duckdb --conflict error --db-path ./provenance.duckdb
consist db merge shard.duckdb --conflict skip --db-path ./provenance.duckdb
```
--conflict error aborts on incompatible global-table schema checks. --conflict skip merges compatible data and skips incompatible tables with warnings.
Rebuild from JSON snapshots:
```
consist db rebuild --json-dir ./runs/consist_runs --mode minimal --db-path ./provenance.duckdb
consist db rebuild --json-dir ./runs/consist_runs --mode full --db-path ./provenance.duckdb
```
minimal restores run/artifact/link baseline. full additionally attempts facet/schema/index restoration where snapshot content and DB schema compatibility allow. stage and phase are restored into canonical run columns when present in the snapshot metadata, with legacy run.meta values preserved for compatibility.
Compact after bulk changes:

consist db compact --db-path ./provenance.duckdb

Run Invocation Diagnostics (Recommended Path)¶

New run/trace/scenario validation errors follow a consistent structure:

Problem: what failed
Cause: why Consist rejected the invocation
Fix: the concrete remediation

Common messages and fixes:

"unexpected keyword argument 'hash_inputs'"¶

Cause: hash_inputs is no longer accepted on run(...), trace(...), and step-level scenario.run(...) / scenario.trace(...) surfaces.
Fix: on those surfaces, use identity_inputs=[...]. Scenario header contexts still route through begin_run(...), where hash_inputs remains a legacy low-level option.

"unexpected keyword argument 'config_plan'"¶

Cause: config_plan is no longer accepted on run(...), trace(...), and step-level scenario.run(...) / scenario.trace(...) surfaces.
Fix: on those surfaces, pass adapter=... (and optional identity_inputs=[...]) instead. Scenario headers do not currently support header-level adapter=....

"identity_inputs/hash_inputs must be a list of paths."¶

Cause: a single string/path or invalid shape was passed instead of a list.
Fix: pass identity_inputs=[Path(...)] or identity_inputs=[(\"label\", Path(...))].

"Failed to compute identity input digests ..."¶

Cause: one or more identity input paths are missing or unreadable.
Fix: verify every identity input path exists and is readable before running.

"load_inputs=True requires inputs to be a dict." / "input_binding=... requires inputs to be a dict."¶

Cause: automatic input binding requires named inputs so Consist can match function parameters.
Fix: pass inputs={\"param_name\": path_or_artifact}. To disable binding, use ExecutionOptions(input_binding=\"none\") (or legacy ExecutionOptions(load_inputs=False)).

"cache_hydration='outputs-requested' requires output_paths."¶

Cause: requested-output hydration needs explicit destination mappings.
Fix: declare output_paths={...} whenever using cache_hydration='outputs-requested'.

"Tracker.run supports executor='python' or 'container'."¶

Cause: an unsupported executor value was provided.
Fix: set ExecutionOptions(executor='python') or ExecutionOptions(executor='container', container={...}).

"cache_options.code_identity callable modes require executor='python'."¶

Cause: callable code identity modes require Python callable execution.
Fix: either switch to executor='python' or use code_identity='repo_git' for container runs.

"executor='container' requires output_paths."¶

Cause: container runs cannot infer outputs from Python return values.
Fix: provide explicit output mappings with output_paths={key: path}.

"Scenario input string did not resolve ..."¶

Cause: the scenario input string matched neither a Coupler key nor an existing filesystem path.
Fix: pass a real path, or on the recommended path use consist.refs(...) between steps.

Cache & Provenance Issues¶

"Relation leak warnings"¶

Symptom: Warning: Consist has N active DuckDB relations...

Root Cause: Relations returned by consist.load(...) (tabular artifacts) keep a DuckDB connection open until you close them.

Solution:

Prefer consist.load_df(...) if you only need a pandas DataFrame.
Use consist.load_relation(...) as a context manager to ensure connections are closed.
If you're intentionally holding many Relations, increase the warning threshold: CONSIST_RELATION_WARN_THRESHOLD=500.

"Old DBs no longer load after the Relation-first refactor"¶

Symptom: Errors when reading artifacts or querying the DB after upgrading.

Root Cause: The artifact schema changed: - Artifact.uri → Artifact.container_uri - Artifact.table_path added (nullable) for container formats (HDF5 tables) - Artifact.array_path added (nullable) for array formats - meta["table_path"] is no longer used

Solution:

Reset your Consist database(s) and re-run workflows:

rm ./provenance.duckdb
rm ./test_db.duckdb

Then update any code that referenced artifact.uri or artifact.meta["table_path"]:

# Before
artifact.uri
artifact.meta.get("table_path")

# After
artifact.container_uri
artifact.table_path

"Cache hit but output files are missing"¶

Symptom: cache_hit=True but artifact.path doesn't exist on disk.

Root Cause: Consist returned a cache hit but didn't materialize the files to disk.

Why this happens: Consist defaults to metadata-only cache hits to keep cache checks fast and avoid duplicating large files. You explicitly opt in to file copying via hydration/materialization when you need bytes on disk.

Solution:

Use cache hydration to copy files:

from consist import CacheOptions

result = consist.run(
    fn=my_function,
    inputs={...},
    cache_options=CacheOptions(cache_hydration="outputs-all"),  # Copy all cached outputs
    ...
)

Or use the explicit run-scoped recovery API when you need to rebuild a prior run's outputs into a new directory or recover from an archive mirror:

from pathlib import Path

restored = tracker.materialize_run_outputs(
    "prior_run_id",
    target_root=Path("rehydrated"),
    source_root=Path("/archive/outputs_mirror"),  # optional
)

This preserves historical relative layout under target_root. If the original cold files are missing but the outputs were ingested, Consist can reconstruct CSV/Parquet outputs from DuckDB.

For archive-mirror cache-hit hydration, you can either pass cache_options=CacheOptions(materialize_cached_outputs_source_root=Path(...)) on run(...) / scenario steps, or use the same materialize_cached_outputs_source_root=Path(...) override on low-level tracker.start_run(...) flows.

tracker.materialize_run_outputs(...) can also restore into a configured mount root without enabling allow_external_paths=True.

"Same inputs/config but cache not found"¶

Symptom: Code hasn't changed, inputs haven't changed, but run re-executes instead of hitting cache.

Root Cause: Signature mismatch. Something in the cache key changed.

Solution:

Debug the signature:

from pathlib import Path

identity = tracker.identity
code_hash = identity.get_code_version()
# If you want to match Consist's exact run hash, include model/year/iteration:
# config_hash = identity.compute_run_config_hash(config={"param": value}, model="my_model", year=2030)
config_hash = identity.compute_config_hash({"param": value})
input_hash = identity.compute_file_checksum(Path("input.csv"))

print(f"Code: {code_hash}")
print(f"Config: {config_hash}")
print(f"Inputs: {input_hash}")

# Check if these match a prior run
prior_runs = tracker.find_runs()
for run in prior_runs:
    print(f"Run {run.id}: signature={run.signature}")

Common causes: - Code changed: Check git status, function definitions - Config changed: Check parameter types (0 vs 0.0, "0" vs 0) - Input file changed: Check file modification time, content hash - Run fields changed: model, year, or iteration are folded into the config hash - Dependencies changed: Installed package versions can affect behavior

"How do I clear/reset cache?"¶

Solution:

Delete the database file:

rm ./provenance.duckdb

This clears all run history and cache. Next run will re-execute everything.

To keep history but force re-execution:

from consist import CacheOptions

result = consist.run(
    fn=your_fn,
    inputs={...},
    outputs=[...],
    cache_options=CacheOptions(cache_mode="overwrite"),
)

"Database locked" error¶

Symptom: database is locked or similar error when running multiple Consist processes.

Root Cause: DuckDB locks the database during writes. Concurrent write attempts fail.

Solution:

Run sequentially (recommended):

python workflow1.py
python workflow2.py

Use separate databases per process:

tracker = Tracker(
    run_dir="./runs",
    db_path=f"./provenance_{process_id}.duckdb",  # Unique per process
)

Tune Consist retry/backoff settings (best for shared HPC DB files):

# dlt ingest lock retries
export CONSIST_DLT_LOCK_RETRIES=40
export CONSIST_DLT_LOCK_BASE_SLEEP_SECONDS=0.2
export CONSIST_DLT_LOCK_MAX_SLEEP_SECONDS=5.0

# run/artifact/config sync lock retries
export CONSIST_DB_LOCK_RETRIES=40
export CONSIST_DB_LOCK_BASE_SLEEP_SECONDS=0.2
export CONSIST_DB_LOCK_MAX_SLEEP_SECONDS=5.0

These settings apply process-wide for each Tracker instance.

Defaults: - CONSIST_DLT_LOCK_RETRIES=20 - CONSIST_DLT_LOCK_BASE_SLEEP_SECONDS=0.1 - CONSIST_DLT_LOCK_MAX_SLEEP_SECONDS=2.0 - CONSIST_DB_LOCK_RETRIES=20 - CONSIST_DB_LOCK_BASE_SLEEP_SECONDS=0.1 - CONSIST_DB_LOCK_MAX_SLEEP_SECONDS=2.0

HPC starting profile (multiple concurrent writers):
Start with retries at 40.
Start with base sleep at 0.2 seconds.
Start with max sleep at 5.0 seconds.
If lock failures persist, increase retries first, then max sleep.
If runs feel too slow to fail when lock is permanent, lower retries.

Mount & Path Issues¶

"Mount not resolving" (Container integration)¶

Symptom: Container runs but /inputs is empty or doesn't exist.

Root Cause: Volume mount paths don't exist or are incorrect.

Solution:

Check paths exist on host:

from pathlib import Path
for host_path in volumes.keys():
    assert Path(host_path).exists(), f"Missing: {host_path}"

Use absolute paths:

# DON'T:
volumes={"./data": "/inputs"}

# DO:
volumes={str(Path("./data").resolve()): "/inputs"}

Check permissions:

ls -la ./data
# Must be readable by your user (and by Docker if using docker-in-docker)

Debug mount:

docker run -it -v ./data:/inputs my-image ls -la /inputs

If Consist errors about host paths not living under configured mounts, either add the mount in your Tracker or pass strict_mounts=False to run_container().

"URI resolution failed"¶

Symptom: Error like Cannot resolve URI: outputs://key/file.csv

Root Cause: URI scheme not recognized or mount not registered.

Solution:

Use absolute paths instead of URI schemes for file operations:

# DON'T:
artifact_uri = "outputs://key/result.csv"
df = pd.read_csv(artifact_uri)  # Fails

# DO:
with tracker.start_run("resolve_uri", model="example"):
    artifact = tracker.log_artifact(result_path, key="key", direction="output")
    df = pd.read_csv(artifact.path)  # Use .path property

Or resolve URI explicitly:

resolved_path = tracker.resolve_uri("outputs://key/result.csv")
df = pd.read_csv(resolved_path)

"Working directory changed between runs"¶

Symptom: File paths work in first run but fail in second run (re-run from different directory).

Root Cause: Relative paths depend on current working directory.

Solution:

Use absolute paths everywhere:

# DON'T:
output_file = "results.csv"  # Relative to cwd

# DO:
output_file = Path(tracker.run_dir) / "results.csv"  # Absolute

Or use artifact URIs:

with tracker.start_run("log_output", model="example"):
    tracker.log_artifact(result, key="output", direction="output")
# Later, access via:
artifact = tracker.get_artifacts_for_run("run_id").outputs["output"]
print(artifact.path)  # Absolute path

Data & Schema Issues¶

"Schema mismatch during ingestion"¶

Symptom: Error like Column 'age' expected int, got str

Root Cause: DataFrame column type doesn't match schema definition.

Solution:

Convert DataFrame types before ingestion:

from your_pkg.models import MySchema

# Check types
print(df.dtypes)

# Convert if needed
df = df.astype({
    "age": "int64",
    "income": "float64",
    "name": "object",
})

with tracker.start_run("ingest_data", model="example"):
    tracker.log_dataframe(df, key="data", schema=MySchema)

Or use Pandas casting:

df["age"] = pd.to_numeric(df["age"], errors="coerce")  # Convert with fallback

"Null in non-optional field"¶

Symptom: Warning like Null value in non-optional field 'age'

Root Cause: DataFrame has NaN/None in a field that schema requires non-null.

Solution:

Drop nulls:
```
df = df.dropna(subset=["age"])
```

Fill nulls:

df["age"] = df["age"].fillna(0)  # Default value

Make field optional:

from typing import Optional

class MySchema(SQLModel, table=True):
    age: Optional[int]  # Can be None

"Duplicate primary keys"¶

Symptom: Error like Primary key violation: duplicate ID

Root Cause: DataFrame has duplicate values in the primary key column.

Solution:

Deduplicate before ingestion:

# Keep last occurrence (or "first")
df = df.drop_duplicates(subset=["id"], keep="last")

# Or remove all duplicates
df = df[~df.duplicated(subset=["id"], keep=False)]

with tracker.start_run("ingest_deduped", model="example"):
    tracker.log_dataframe(df, key="data", schema=MySchema)

"Can't query across runs"¶

Symptom: tracker.views.MySchema doesn't exist or returns empty results.

Root Cause: Schema not registered or data not ingested with schema.

Solution:

Register schema on Tracker creation:

tracker = Tracker(
    run_dir="./runs",
    db_path="./provenance.duckdb",
    schemas=[Person, Trip],  # Register here
)

Ingest with schema: python with tracker.start_run("ingest_persons", model="example"): tracker.log_dataframe(df, key="persons", schema=Person)

Verify schema exists:

print(tracker.views.Person)  # Should not raise AttributeError

Container Execution Issues¶

"Container execution failed"¶

Symptom: Error: RuntimeError: Container execution failed

Root Cause: Container exited with non-zero code.

Solution:

Test container manually:

docker run -it -v ./data:/inputs my-image python script.py

Check logs:
```
docker logs <container_id>
```

Add verbose output:

result = run_container(
    ...
    environment={"DEBUG": "1"},  # Enable debug output in container
)

Verify input paths:

from pathlib import Path
for input_path in inputs:
    assert Path(input_path).exists(), f"Missing: {input_path}"

"Output files not found after container"¶

Symptom: Warning: Expected output not found: ./outputs/result.csv

Root Cause: Container didn't create output at expected location.

Solution:

Verify container creates outputs:

docker run -it -v ./outputs:/outputs my-image sh -c "ls -la /outputs && echo 'done'"

Check output paths in container:

run_container(
    ...
    command=["python", "script.py"],  # Ensure script creates output
)

Use correct host paths:

output_dir = Path("./outputs").mkdir(parents=True, exist_ok=True)
result = run_container(
    ...
    outputs=[str(output_dir / "result.csv")],
)

"Image pull failed"¶

Symptom: Error: Error pulling image: authentication required

Root Cause: Docker can't access the image registry.

Solution:

Authenticate:
```
docker login
```

Use public images:

run_container(
    image="ubuntu:latest",  # Public image
    ...
)

Check image exists locally:
```
docker images | grep my-image
```

Disable pull:

run_container(
    ...
    pull_latest=False,  # Use local image if available
)

"Permission denied in container"¶

Symptom: Permission denied when container writes to mounted volume.

Root Cause: Container user doesn't have write permission on host mount.

Solution:

Make directory writable:
```
chmod 777 ./outputs
```

Run container as current user:

# (Requires custom Dockerfile or user configuration)
# docker run --user $(id -u):$(id -g) ...

Create output directory with correct permissions:

output_dir = Path("./outputs")
output_dir.mkdir(parents=True, exist_ok=True, mode=0o777)

Performance Issues¶

"Runs are very slow"¶

Symptom: Each run takes much longer than expected.

Root Cause: Several possibilities:

No cache hits: Check if signature is changing unexpectedly.
File I/O bottleneck: Large artifact materialization.
Database queries slow: Too many cross-run queries.
Container startup overhead: Each container run adds 1-2 seconds.

Solution:

Profile execution:

import time
start = time.time()
result = consist.run(...)
print(f"Elapsed: {time.time() - start}s")
print(f"Cache hit: {result.cache_hit}")

Avoid unnecessary materialization:

from consist import CacheOptions

# Don't materialize if you don't need it
result = consist.run(
    ...,
    cache_options=CacheOptions(cache_hydration="none"),
)

Use Parquet instead of CSV (faster parsing):

df.to_parquet("output.parquet")  # Instead of .to_csv()

Batch containers to reduce startup overhead:

# DON'T:
for i in range(100):
    run_container(...)  # 100 containers, 100 startups

# DO:
run_container(
    command=["python", "process_batch.py", "--n", "100"],
    ...
)

"Database is huge and slow"¶

Symptom: Queries are slow, database file is large.

Root Cause: Too much data ingested or too many runs.

Solution:

Vacuum database:

with tracker.engine.begin() as conn:
    conn.exec_driver_sql("VACUUM")

Archive old runs:

# Move old database
mv provenance.duckdb provenance.backup.duckdb
# Start fresh

Use selective ingestion: python # Don't ingest everything, just what you need with tracker.start_run("sample_ingest", model="example"): tracker.log_dataframe(df.head(1000), key="sample") # Sample instead of all

Debugging Tools¶

Enable Logging¶

import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("consist")
logger.setLevel(logging.DEBUG)

This prints detailed provenance tracking, signature computation, and cache decisions.

Inspect Run Metadata¶

run = tracker.get_run("run_id")
print(f"Signature: {run.signature}")
print(f"Code hash: {run.git_hash}")
print(f"Meta: {run.meta}")

Inspect Database¶

import duckdb

conn = duckdb.connect("provenance.duckdb")
print(conn.query("SELECT * FROM run LIMIT 5").df())
print(conn.query("SELECT * FROM artifact LIMIT 5").df())

Check File Hashes¶

from pathlib import Path

with tracker.start_run("hash_input", model="example"):
    artifact = tracker.log_artifact(Path("input.csv"), key="input", direction="input")
    print(f"Path: {artifact.path}")
    print(f"Hash: {artifact.hash}")
    print(f"Size: {artifact.path.stat().st_size}")

Getting Help¶

If you hit an issue not covered here:

Check the logs:

logging.basicConfig(level=logging.DEBUG)

Inspect database:
```
duckdb provenance.duckdb ".schema"
```
File an issue on GitHub with:
Error message and traceback
Minimal reproducible example
Output of consist runs (recent run history)
Output of logging (with DEBUG enabled)

Troubleshooting Guide¶

Database Maintenance Runbook (Snapshot -> Diagnose -> Action)¶

Run Invocation Diagnostics (Recommended Path)¶

"unexpected keyword argument 'hash_inputs'"¶

"unexpected keyword argument 'config_plan'"¶

"identity_inputs/hash_inputs must be a list of paths."¶

"Failed to compute identity input digests ..."¶

"load_inputs=True requires inputs to be a dict." / "input_binding=... requires inputs to be a dict."¶

"cache_hydration='outputs-requested' requires output_paths."¶

"Tracker.run supports executor='python' or 'container'."¶

"cache_options.code_identity callable modes require executor='python'."¶

"executor='container' requires output_paths."¶

"Scenario input string did not resolve ..."¶

Cache & Provenance Issues¶

"Relation leak warnings"¶

"Old DBs no longer load after the Relation-first refactor"¶

"Cache hit but output files are missing"¶

"Same inputs/config but cache not found"¶

"How do I clear/reset cache?"¶

"Database locked" error¶

Mount & Path Issues¶

"Mount not resolving" (Container integration)¶

"URI resolution failed"¶

"Working directory changed between runs"¶

Data & Schema Issues¶

"Schema mismatch during ingestion"¶

"Null in non-optional field"¶

"Duplicate primary keys"¶

"Can't query across runs"¶

Container Execution Issues¶

"Container execution failed"¶

"Output files not found after container"¶

"Image pull failed"¶

"Permission denied in container"¶

Performance Issues¶

"Runs are very slow"¶

"Database is huge and slow"¶

Debugging Tools¶

Enable Logging¶

Inspect Run Metadata¶

Inspect Database¶

Check File Hashes¶

Getting Help¶

See Also¶