¶
Start Here¶
Follow this path in order if you are new to Consist:
This path takes you from a fresh environment to a working multi-step cached pipeline, then into deeper usage patterns.
Prerequisites¶
Note
- Python 3.11+
- Base install:
pip install consist - For the first workflow tutorial (Parquet writes):
run
pip install "consist[parquet]" - See Installation for complete options, including source installs and optional extras.
What is Consist?¶
Consist is a Python library for provenance tracking and intelligent caching in scientific simulation workflows. Tasks are ordinary Python functions; Consist records lineage without restructuring your code or introducing implicit dependencies.
It helps you:
- Answer "what exactly produced this result?"—code version, config, and inputs, all queryable after the fact
- Skip redundant computation: cache hits fire automatically when code, config, and inputs are unchanged
- Wire multi-step pipelines explicitly via artifact references, not name-based injection or global state
- Query and compare results across runs using DuckDB-backed SQL
- Keep pipelines portable across machines via URI + mount resolution
Secondary Navigation¶
After completing the onboarding path above, use these role/topic guides for deeper work.
- Simulation developers: Architecture, Config Adapters, Container Integration
- Pipeline operators: CLI Reference, DB Maintenance Guide, Troubleshooting
- Researchers: Data Materialization, Mounts & Portability, Glossary
- Caching and reuse: Caching & Hydration
- Configuration and identity: Config Management
- SQL analytics and ingestion: Data Materialization, DLT Loader Guide, Schema Export
- Workflow patterns: Usage Guide, Workflow Contexts API
- Programmatic API: API Reference
Common follow-up tasks¶
| I want to... | Go to |
|---|---|
| Speed up my pipeline | Caching & Hydration |
| Debug a cache miss | Troubleshooting |
| Operate or repair the provenance DB | DB Maintenance Guide |
| Find which config produced a result | consist lineage |
| Compare results across scenarios | Data Materialization |
| Ingest data for SQL analysis | Data Materialization |
| Understand config vs. facets | Config Management |
| Share a reproducible study | Mounts & Portability |
| Integrate with ActivitySim/BEAM/MATSim | Config Adapters or Containers |
Built on Open Standards¶
Consist relies on modern, high-performance data engineering tools:
- DuckDB: The "SQLite for Analytics" powers our lightning-fast provenance queries and data virtualization.
- SQLModel: Combines SQLAlchemy and Pydantic for robust, type-safe data modeling and schema validation.
- DLT (Data Load Tool): Handles robust, schema-aware data ingestion from diverse sources into your provenance database.
- Apache Parquet & Zarr: Industry-standard formats for efficient, compressed storage of tabular and multi-dimensional scientific data.
Learn More¶
See Core Concepts for a complete mental model, or Glossary for quick term definitions.