¶

Start Here¶

Follow this path in order if you are new to Consist:

This path takes you from a fresh environment to a working multi-step cached pipeline, then into deeper usage patterns.

Prerequisites¶

Note

Python 3.11+
Base install: pip install consist
For the first workflow tutorial (Parquet writes): run pip install "consist[parquet]"
See Installation for complete options, including source installs and optional extras.

What is Consist?¶

Consist is a Python library for provenance tracking and intelligent caching in scientific simulation workflows. Tasks are ordinary Python functions; Consist records lineage without restructuring your code or introducing implicit dependencies.

It helps you:

Answer "what exactly produced this result?"—code version, config, and inputs, all queryable after the fact
Skip redundant computation: cache hits fire automatically when code, config, and inputs are unchanged
Wire multi-step pipelines explicitly via artifact references, not name-based injection or global state
Query and compare results across runs using DuckDB-backed SQL
Keep pipelines portable across machines via URI + mount resolution

After completing the onboarding path above, use these role/topic guides for deeper work.

By RoleBy Topic

Simulation developers: Architecture, Config Adapters, Container Integration
Pipeline operators: CLI Reference, DB Maintenance Guide, Troubleshooting
Researchers: Data Materialization, Mounts & Portability, Glossary

Caching and reuse: Caching & Hydration
Configuration and identity: Config Management
SQL analytics and ingestion: Data Materialization, DLT Loader Guide, Schema Export
Workflow patterns: Usage Guide, Workflow Contexts API
Programmatic API: API Reference

Common follow-up tasks¶

I want to...	Go to
Speed up my pipeline	Caching & Hydration
Debug a cache miss	Troubleshooting
Operate or repair the provenance DB	DB Maintenance Guide
Find which config produced a result	`consist lineage`
Compare results across scenarios	Data Materialization
Ingest data for SQL analysis	Data Materialization
Understand config vs. facets	Config Management
Share a reproducible study	Mounts & Portability
Integrate with ActivitySim/BEAM/MATSim	Config Adapters or Containers

Built on Open Standards¶

Consist relies on modern, high-performance data engineering tools:

DuckDB: The "SQLite for Analytics" powers our lightning-fast provenance queries and data virtualization.
SQLModel: Combines SQLAlchemy and Pydantic for robust, type-safe data modeling and schema validation.
DLT (Data Load Tool): Handles robust, schema-aware data ingestion from diverse sources into your provenance database.
Apache Parquet & Zarr: Industry-standard formats for efficient, compressed storage of tabular and multi-dimensional scientific data.

Learn More¶

See Core Concepts for a complete mental model, or Glossary for quick term definitions.

¶