Quickstart¶
Consist has one recommended onboarding path:
run(...)for cacheable function stepstrace(...)for always-execute blocks with provenancescenario(...)to compose multi-step workflows from those two patterns
Pattern 1: Cacheable Step (run)¶
The same step can be expressed three ways. The recommended onboarding path is
input_binding="paths" because it keeps the function boundary explicit.
Save this as quickstart_run.py:
from pathlib import Path
import pandas as pd
def summarize_values(data_path: Path) -> Path:
df = pd.read_csv(data_path)
out = Path("./summary.txt")
out.write_text(f"mean={df['value'].mean():.1f}\n", encoding="utf-8")
print(f"executed summarize_values on {data_path}")
return out
summary_path = summarize_values(Path("input.csv"))
print(summary_path.read_text().strip())
Save this as quickstart_run.py:
from pathlib import Path
import pandas as pd
from consist import ExecutionOptions, Tracker
tracker = Tracker(run_dir="./runs", db_path="./provenance.duckdb")
def summarize_values(data_path: Path) -> dict[str, Path]:
df = pd.read_csv(data_path)
out = Path("./summary.txt")
out.write_text(f"mean={df['value'].mean():.1f}\n", encoding="utf-8")
print(f"executed summarize_values on {data_path}")
return {"summary": out}
first = tracker.run(
fn=summarize_values,
inputs={"data_path": Path("input.csv")},
outputs=["summary"],
execution_options=ExecutionOptions(input_binding="paths"),
)
second = tracker.run(
fn=summarize_values,
inputs={"data_path": Path("input.csv")},
outputs=["summary"],
execution_options=ExecutionOptions(input_binding="paths"),
)
print(first.cache_hit, second.cache_hit) # False, True
Save this as quickstart_run.py:
from pathlib import Path
import pandas as pd
from consist import ExecutionOptions, Tracker
tracker = Tracker(run_dir="./runs", db_path="./provenance.duckdb")
def summarize_values(data: pd.DataFrame) -> dict[str, Path]:
out = Path("./summary.txt")
out.write_text(f"mean={data['value'].mean():.1f}\n", encoding="utf-8")
print("executed summarize_values on loaded DataFrame")
return {"summary": out}
first = tracker.run(
fn=summarize_values,
inputs={"data": Path("input.csv")},
outputs=["summary"],
execution_options=ExecutionOptions(input_binding="loaded"),
)
second = tracker.run(
fn=summarize_values,
inputs={"data": Path("input.csv")},
outputs=["summary"],
execution_options=ExecutionOptions(input_binding="loaded"),
)
print(first.cache_hit, second.cache_hit) # False, True
Create input.csv containing:
On the second call (and on the first call's second tracker.run), run(...)
returns a cache hit and skips execution.
input_binding="paths" means your function receives local Path objects.
input_binding="loaded" means Consist loads the artifact for you before calling
the function. Use "loaded" when you want hydrated tables or objects; use
"paths" when you want the callable to own its own I/O boundary.
The main Consist-specific onboarding cost in these examples is small but real:
return dict[str, Path] instead of a bare Path, and declare
outputs=[...] so the tracker knows which artifact key to log.
Pattern 2: Always-Execute Step (trace)¶
Use trace(...) for diagnostics or side effects that should always run:
with tracker.trace("inspect_summary", inputs={"data": Path("input.csv")}) as t:
audit_path = Path("./audit.txt")
audit_path.write_text("inspection complete\n", encoding="utf-8")
t.log_output(audit_path, key="audit")
trace(...) still records identity/config/inputs, but the block always executes.
Pattern 3: Compose with scenario¶
Combine run(...) and trace(...) in one workflow:
import consist
with tracker.scenario("demo_pipeline") as sc:
step = sc.run(
fn=summarize_values,
inputs={"data_path": Path("input.csv")},
outputs=["summary"],
execution_options=ExecutionOptions(input_binding="paths"),
)
with sc.trace(
"quality_check",
inputs={"summary": consist.ref(step, key="summary")},
) as t:
qc_path = Path("./qc.txt")
qc_path.write_text("qc ok\n", encoding="utf-8")
t.log_output(qc_path, key="qc")
Next Steps¶
- First Workflow for a larger two-step pipeline
- API Essentials for the recommended API surface
- Building a Domain Tracker for reusable wrapper design