Skip to content

Consist


Start Here

Follow this path in order if you are new to Consist:

  1. Installation
  2. Quickstart
  3. First Workflow
  4. Core Concepts
  5. Usage Guide
  6. Example Gallery
  7. Advanced Usage

This path takes you from a fresh environment to a working multi-step cached pipeline, then into deeper usage patterns.

Prerequisites

Note

  • Python 3.11+
  • Base install: pip install consist
  • For the first workflow tutorial (Parquet writes): run pip install "consist[parquet]"
  • See Installation for complete options, including source installs and optional extras.

What is Consist?

Consist is a Python library for provenance tracking and intelligent caching in scientific simulation workflows. Tasks are ordinary Python functions; Consist records lineage without restructuring your code or introducing implicit dependencies.

It helps you:

  • Answer "what exactly produced this result?"—code version, config, and inputs, all queryable after the fact
  • Skip redundant computation: cache hits fire automatically when code, config, and inputs are unchanged
  • Wire multi-step pipelines explicitly via artifact references, not name-based injection or global state
  • Query and compare results across runs using DuckDB-backed SQL
  • Keep pipelines portable across machines via URI + mount resolution

Secondary Navigation

After completing the onboarding path above, use these role/topic guides for deeper work.

Common follow-up tasks

I want to... Go to
Speed up my pipeline Caching & Hydration
Debug a cache miss Troubleshooting
Operate or repair the provenance DB DB Maintenance Guide
Find which config produced a result consist lineage
Compare results across scenarios Data Materialization
Ingest data for SQL analysis Data Materialization
Understand config vs. facets Config Management
Share a reproducible study Mounts & Portability
Integrate with ActivitySim/BEAM/MATSim Config Adapters or Containers

Built on Open Standards

Consist relies on modern, high-performance data engineering tools:

  • DuckDB: The "SQLite for Analytics" powers our lightning-fast provenance queries and data virtualization.
  • SQLModel: Combines SQLAlchemy and Pydantic for robust, type-safe data modeling and schema validation.
  • DLT (Data Load Tool): Handles robust, schema-aware data ingestion from diverse sources into your provenance database.
  • Apache Parquet & Zarr: Industry-standard formats for efficient, compressed storage of tabular and multi-dimensional scientific data.

Learn More

See Core Concepts for a complete mental model, or Glossary for quick term definitions.