Troubleshooting Guide¶
This guide organizes issues by symptom. For concept definitions, see Core Concepts. For topic-specific guides, see:
Recommended path
For normal workflow code, the recommended path is consist.run(...),
consist.trace(...), or consist.scenario(...). Some troubleshooting sections
intentionally use low-level lifecycle APIs (for example tracker.start_run(...)
and manual materialization helpers) to isolate specific failure modes.
Database Maintenance Runbook (Snapshot -> Diagnose -> Action)¶
When troubleshooting provenance DB health, prefer this safe sequence:
- Snapshot first (rollback safety):
consist db snapshot --out ./snapshots/provenance.pre-maintenance.duckdb --db-path ./provenance.duckdb
- Diagnose before mutating:
-
Take one action at a time (preview first):
-
Purge with preview:
-
Optional unscoped cache pruning:
--prune-cachebehavior: - only applies when--delete-ingested-datais enabled - only applies when references are derivable (for example,run_linktables withrun_id+content_hash) - assumescontent_hashhas equivalent semantics across derivablerun_linkandunscoped_cachetables - becomes a skip/no-op when references are not derivable -
Merge shards with explicit conflict policy:
consist db merge shard.duckdb --conflict error --db-path ./provenance.duckdb consist db merge shard.duckdb --conflict skip --db-path ./provenance.duckdb--conflict erroraborts on incompatible global-table schema checks.--conflict skipmerges compatible data and skips incompatible tables with warnings. -
Rebuild from JSON snapshots:
consist db rebuild --json-dir ./runs/consist_runs --mode minimal --db-path ./provenance.duckdb consist db rebuild --json-dir ./runs/consist_runs --mode full --db-path ./provenance.duckdbminimalrestores run/artifact/link baseline.fulladditionally attempts facet/schema/index restoration where snapshot content and DB schema compatibility allow.stageandphaseare restored into canonical run columns when present in the snapshot metadata, with legacyrun.metavalues preserved for compatibility. -
Compact after bulk changes:
Run Invocation Diagnostics (Recommended Path)¶
New run/trace/scenario validation errors follow a consistent structure:
Problem: what failedCause: why Consist rejected the invocationFix: the concrete remediation
Common messages and fixes:
"unexpected keyword argument 'hash_inputs'"¶
Cause:hash_inputsis no longer accepted onrun(...),trace(...), and step-levelscenario.run(...)/scenario.trace(...)surfaces.Fix: on those surfaces, useidentity_inputs=[...]. Scenario header contexts still route throughbegin_run(...), wherehash_inputsremains a legacy low-level option.
"unexpected keyword argument 'config_plan'"¶
Cause:config_planis no longer accepted onrun(...),trace(...), and step-levelscenario.run(...)/scenario.trace(...)surfaces.Fix: on those surfaces, passadapter=...(and optionalidentity_inputs=[...]) instead. Scenario headers do not currently support header-leveladapter=....
"identity_inputs/hash_inputs must be a list of paths."¶
Cause: a single string/path or invalid shape was passed instead of a list.Fix: passidentity_inputs=[Path(...)]oridentity_inputs=[(\"label\", Path(...))].
"Failed to compute identity input digests ..."¶
Cause: one or more identity input paths are missing or unreadable.Fix: verify every identity input path exists and is readable before running.
"load_inputs=True requires inputs to be a dict." / "input_binding=... requires inputs to be a dict."¶
Cause: automatic input binding requires named inputs so Consist can match function parameters.Fix: passinputs={\"param_name\": path_or_artifact}. To disable binding, useExecutionOptions(input_binding=\"none\")(or legacyExecutionOptions(load_inputs=False)).
"cache_hydration='outputs-requested' requires output_paths."¶
Cause: requested-output hydration needs explicit destination mappings.Fix: declareoutput_paths={...}whenever usingcache_hydration='outputs-requested'.
"Tracker.run supports executor='python' or 'container'."¶
Cause: an unsupported executor value was provided.Fix: setExecutionOptions(executor='python')orExecutionOptions(executor='container', container={...}).
"cache_options.code_identity callable modes require executor='python'."¶
Cause: callable code identity modes require Python callable execution.Fix: either switch toexecutor='python'or usecode_identity='repo_git'for container runs.
"executor='container' requires output_paths."¶
Cause: container runs cannot infer outputs from Python return values.Fix: provide explicit output mappings withoutput_paths={key: path}.
"Scenario input string did not resolve ..."¶
Cause: the scenario input string matched neither a Coupler key nor an existing filesystem path.Fix: pass a real path, or on the recommended path useconsist.refs(...)between steps.
Cache & Provenance Issues¶
"Relation leak warnings"¶
Symptom: Warning: Consist has N active DuckDB relations...
Root Cause: Relations returned by consist.load(...) (tabular artifacts) keep a
DuckDB connection open until you close them.
Solution:
- Prefer
consist.load_df(...)if you only need a pandas DataFrame. - Use
consist.load_relation(...)as a context manager to ensure connections are closed. - If you're intentionally holding many Relations, increase the warning threshold:
CONSIST_RELATION_WARN_THRESHOLD=500.
"Old DBs no longer load after the Relation-first refactor"¶
Symptom: Errors when reading artifacts or querying the DB after upgrading.
Root Cause: The artifact schema changed:
- Artifact.uri → Artifact.container_uri
- Artifact.table_path added (nullable) for container formats (HDF5 tables)
- Artifact.array_path added (nullable) for array formats
- meta["table_path"] is no longer used
Solution:
Reset your Consist database(s) and re-run workflows:
Then update any code that referenced artifact.uri or artifact.meta["table_path"]:
# Before
artifact.uri
artifact.meta.get("table_path")
# After
artifact.container_uri
artifact.table_path
"Cache hit but output files are missing"¶
Symptom: cache_hit=True but artifact.path doesn't exist on disk.
Root Cause: Consist returned a cache hit but didn't materialize the files to disk.
Why this happens: Consist defaults to metadata-only cache hits to keep cache checks fast and avoid duplicating large files. You explicitly opt in to file copying via hydration/materialization when you need bytes on disk.
Solution:
Use cache hydration to copy files:
from consist import CacheOptions
result = consist.run(
fn=my_function,
inputs={...},
cache_options=CacheOptions(cache_hydration="outputs-all"), # Copy all cached outputs
...
)
Or use the explicit run-scoped recovery API when you need to rebuild a prior run's outputs into a new directory or recover from an archive mirror:
from pathlib import Path
restored = tracker.materialize_run_outputs(
"prior_run_id",
target_root=Path("rehydrated"),
source_root=Path("/archive/outputs_mirror"), # optional
)
This preserves historical relative layout under target_root. If the original
cold files are missing but the outputs were ingested, Consist can reconstruct
CSV/Parquet outputs from DuckDB.
For archive-mirror cache-hit hydration, you can either pass
cache_options=CacheOptions(materialize_cached_outputs_source_root=Path(...))
on run(...) / scenario steps, or use the same
materialize_cached_outputs_source_root=Path(...) override on low-level
tracker.start_run(...) flows.
tracker.materialize_run_outputs(...) can also restore into a configured mount
root without enabling allow_external_paths=True.
"Same inputs/config but cache not found"¶
Symptom: Code hasn't changed, inputs haven't changed, but run re-executes instead of hitting cache.
Root Cause: Signature mismatch. Something in the cache key changed.
Solution:
Debug the signature:
from pathlib import Path
identity = tracker.identity
code_hash = identity.get_code_version()
# If you want to match Consist's exact run hash, include model/year/iteration:
# config_hash = identity.compute_run_config_hash(config={"param": value}, model="my_model", year=2030)
config_hash = identity.compute_config_hash({"param": value})
input_hash = identity.compute_file_checksum(Path("input.csv"))
print(f"Code: {code_hash}")
print(f"Config: {config_hash}")
print(f"Inputs: {input_hash}")
# Check if these match a prior run
prior_runs = tracker.find_runs()
for run in prior_runs:
print(f"Run {run.id}: signature={run.signature}")
Common causes:
- Code changed: Check git status, function definitions
- Config changed: Check parameter types (0 vs 0.0, "0" vs 0)
- Input file changed: Check file modification time, content hash
- Run fields changed: model, year, or iteration are folded into the config hash
- Dependencies changed: Installed package versions can affect behavior
"How do I clear/reset cache?"¶
Solution:
Delete the database file:
This clears all run history and cache. Next run will re-execute everything.
To keep history but force re-execution:
from consist import CacheOptions
result = consist.run(
fn=your_fn,
inputs={...},
outputs=[...],
cache_options=CacheOptions(cache_mode="overwrite"),
)
"Database locked" error¶
Symptom: database is locked or similar error when running multiple Consist processes.
Root Cause: DuckDB locks the database during writes. Concurrent write attempts fail.
Solution:
-
Run sequentially (recommended):
-
Use separate databases per process:
-
Tune Consist retry/backoff settings (best for shared HPC DB files):
# dlt ingest lock retries
export CONSIST_DLT_LOCK_RETRIES=40
export CONSIST_DLT_LOCK_BASE_SLEEP_SECONDS=0.2
export CONSIST_DLT_LOCK_MAX_SLEEP_SECONDS=5.0
# run/artifact/config sync lock retries
export CONSIST_DB_LOCK_RETRIES=40
export CONSIST_DB_LOCK_BASE_SLEEP_SECONDS=0.2
export CONSIST_DB_LOCK_MAX_SLEEP_SECONDS=5.0
These settings apply process-wide for each Tracker instance.
Defaults:
- CONSIST_DLT_LOCK_RETRIES=20
- CONSIST_DLT_LOCK_BASE_SLEEP_SECONDS=0.1
- CONSIST_DLT_LOCK_MAX_SLEEP_SECONDS=2.0
- CONSIST_DB_LOCK_RETRIES=20
- CONSIST_DB_LOCK_BASE_SLEEP_SECONDS=0.1
- CONSIST_DB_LOCK_MAX_SLEEP_SECONDS=2.0
- HPC starting profile (multiple concurrent writers):
- Start with retries at
40. - Start with base sleep at
0.2seconds. - Start with max sleep at
5.0seconds. - If lock failures persist, increase retries first, then max sleep.
- If runs feel too slow to fail when lock is permanent, lower retries.
Mount & Path Issues¶
"Mount not resolving" (Container integration)¶
Symptom: Container runs but /inputs is empty or doesn't exist.
Root Cause: Volume mount paths don't exist or are incorrect.
Solution:
-
Check paths exist on host:
-
Use absolute paths:
-
Check permissions:
-
Debug mount:
If Consist errors about host paths not living under configured mounts, either add the
mount in your Tracker or pass strict_mounts=False to run_container().
"URI resolution failed"¶
Symptom: Error like Cannot resolve URI: outputs://key/file.csv
Root Cause: URI scheme not recognized or mount not registered.
Solution:
Use absolute paths instead of URI schemes for file operations:
# DON'T:
artifact_uri = "outputs://key/result.csv"
df = pd.read_csv(artifact_uri) # Fails
# DO:
with tracker.start_run("resolve_uri", model="example"):
artifact = tracker.log_artifact(result_path, key="key", direction="output")
df = pd.read_csv(artifact.path) # Use .path property
Or resolve URI explicitly:
"Working directory changed between runs"¶
Symptom: File paths work in first run but fail in second run (re-run from different directory).
Root Cause: Relative paths depend on current working directory.
Solution:
Use absolute paths everywhere:
# DON'T:
output_file = "results.csv" # Relative to cwd
# DO:
output_file = Path(tracker.run_dir) / "results.csv" # Absolute
Or use artifact URIs:
with tracker.start_run("log_output", model="example"):
tracker.log_artifact(result, key="output", direction="output")
# Later, access via:
artifact = tracker.get_artifacts_for_run("run_id").outputs["output"]
print(artifact.path) # Absolute path
Data & Schema Issues¶
"Schema mismatch during ingestion"¶
Symptom: Error like Column 'age' expected int, got str
Root Cause: DataFrame column type doesn't match schema definition.
Solution:
Convert DataFrame types before ingestion:
from your_pkg.models import MySchema
# Check types
print(df.dtypes)
# Convert if needed
df = df.astype({
"age": "int64",
"income": "float64",
"name": "object",
})
with tracker.start_run("ingest_data", model="example"):
tracker.log_dataframe(df, key="data", schema=MySchema)
Or use Pandas casting:
"Null in non-optional field"¶
Symptom: Warning like Null value in non-optional field 'age'
Root Cause: DataFrame has NaN/None in a field that schema requires non-null.
Solution:
-
Drop nulls:
-
Fill nulls:
-
Make field optional:
"Duplicate primary keys"¶
Symptom: Error like Primary key violation: duplicate ID
Root Cause: DataFrame has duplicate values in the primary key column.
Solution:
Deduplicate before ingestion:
# Keep last occurrence (or "first")
df = df.drop_duplicates(subset=["id"], keep="last")
# Or remove all duplicates
df = df[~df.duplicated(subset=["id"], keep=False)]
with tracker.start_run("ingest_deduped", model="example"):
tracker.log_dataframe(df, key="data", schema=MySchema)
"Can't query across runs"¶
Symptom: tracker.views.MySchema doesn't exist or returns empty results.
Root Cause: Schema not registered or data not ingested with schema.
Solution:
-
Register schema on Tracker creation:
-
Ingest with schema:
python with tracker.start_run("ingest_persons", model="example"): tracker.log_dataframe(df, key="persons", schema=Person) -
Verify schema exists:
Container Execution Issues¶
"Container execution failed"¶
Symptom: Error: RuntimeError: Container execution failed
Root Cause: Container exited with non-zero code.
Solution:
-
Test container manually:
-
Check logs:
-
Add verbose output:
-
Verify input paths:
"Output files not found after container"¶
Symptom: Warning: Expected output not found: ./outputs/result.csv
Root Cause: Container didn't create output at expected location.
Solution:
-
Verify container creates outputs:
-
Check output paths in container:
-
Use correct host paths:
"Image pull failed"¶
Symptom: Error: Error pulling image: authentication required
Root Cause: Docker can't access the image registry.
Solution:
-
Authenticate:
-
Use public images:
-
Check image exists locally:
-
Disable pull:
"Permission denied in container"¶
Symptom: Permission denied when container writes to mounted volume.
Root Cause: Container user doesn't have write permission on host mount.
Solution:
-
Make directory writable:
-
Run container as current user:
-
Create output directory with correct permissions:
Performance Issues¶
"Runs are very slow"¶
Symptom: Each run takes much longer than expected.
Root Cause: Several possibilities:
- No cache hits: Check if signature is changing unexpectedly.
- File I/O bottleneck: Large artifact materialization.
- Database queries slow: Too many cross-run queries.
- Container startup overhead: Each container run adds 1-2 seconds.
Solution:
-
Profile execution:
-
Avoid unnecessary materialization:
-
Use Parquet instead of CSV (faster parsing):
-
Batch containers to reduce startup overhead:
"Database is huge and slow"¶
Symptom: Queries are slow, database file is large.
Root Cause: Too much data ingested or too many runs.
Solution:
-
Vacuum database:
-
Archive old runs:
-
Use selective ingestion:
python # Don't ingest everything, just what you need with tracker.start_run("sample_ingest", model="example"): tracker.log_dataframe(df.head(1000), key="sample") # Sample instead of all
Debugging Tools¶
Enable Logging¶
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("consist")
logger.setLevel(logging.DEBUG)
This prints detailed provenance tracking, signature computation, and cache decisions.
Inspect Run Metadata¶
run = tracker.get_run("run_id")
print(f"Signature: {run.signature}")
print(f"Code hash: {run.git_hash}")
print(f"Meta: {run.meta}")
Inspect Database¶
import duckdb
conn = duckdb.connect("provenance.duckdb")
print(conn.query("SELECT * FROM run LIMIT 5").df())
print(conn.query("SELECT * FROM artifact LIMIT 5").df())
Check File Hashes¶
from pathlib import Path
with tracker.start_run("hash_input", model="example"):
artifact = tracker.log_artifact(Path("input.csv"), key="input", direction="input")
print(f"Path: {artifact.path}")
print(f"Hash: {artifact.hash}")
print(f"Size: {artifact.path.stat().st_size}")
Getting Help¶
If you hit an issue not covered here:
-
Check the logs:
-
Inspect database:
-
File an issue on GitHub with:
- Error message and traceback
- Minimal reproducible example
- Output of
consist runs(recent run history) - Output of logging (with DEBUG enabled)
See Also¶
- Container Integration
- DLT Loader
- Architecture (for implementation details)
- CLI Reference (for debugging commands)