Config, Facets, and Identity Inputs¶

When configuring a run, decide whether each parameter should trigger re-execution on change, be searchable for later filtering, or both.

Decision Tree¶

Should changing this parameter re-run my model? → config
Do I need to search/filter by this value later? → facet
Both? → Use both. A value can appear in config (cache invalidation) and facet (queryable).
Large external config files that must affect the cache key? → identity_inputs

Reference Table¶

Parameter	Affects Cache?	Queryable?	Use For
`config=`	Yes	No (by default)	Parameters that change behavior
`facet=`	No	Yes (indexed)	Metadata for filtering and grouping
`facet_from=`	—	Yes	Extract keys from `config` for queries
`identity_inputs=`	Yes	No	Large or file-based configs that affect identity

The `config` Argument¶

config accepts a plain dict or a Pydantic model. It is:

Hashed into the run signature (cache identity).
Stored in the per-run consist.json snapshot under config.
Not stored as structured/queryable data in DuckDB by default.

If you pass a Pydantic model, it is serialized via model_dump() before hashing and snapshotting.

Config Hashing¶

Consist uses canonical hashing: converting dicts (and YAML/JSON) into a deterministic fingerprint regardless of field order or number formatting. Both {"a": 1, "b": 2} and {"b": 2, "a": 1} produce the same hash, so the cache stays stable when config dicts are built in different orders.

The run signature is derived from:

the config hash (the config payload, normalized)
the input hash (logged inputs)
the code hash

A few run-level fields — model, year, and iteration — are also folded into the config hash under a reserved __consist_run_fields__ key. When set, cache_epoch and cache_version are included there too. This means changing those values will change the signature even if config is otherwise identical.

stage and phase are separate first-class run dimensions, not facets. They live on Run, are queryable with find_runs(stage=..., phase=...), and are mirrored into run.meta only for backward compatibility with older databases.

Facets are compact, queryable config subsets persisted to DuckDB. They do not affect the cache key — they exist purely for filtering and grouping runs after the fact.

Sources:

facet={"key": value, ...} — explicit facet payload
facet_from=["key", ...] — extracts top-level keys from config by name
to_consist_facet() on a Pydantic model — if implemented

If both facet and facet_from are provided, extracted values are merged into the explicit facet dict; explicit keys win on collision. facet_from raises KeyError if any listed key is missing from config.

Guardrails¶

Limit	Behavior when exceeded
Facet > 16 KB (canonical JSON)	Facet not persisted; run continues
Facet > 500 KV rows	Facet not indexed; run continues

No error is raised — Consist degrades silently to avoid blocking runs.

Facet persistence creates indexed tables:

config_facet — deduplicated facet JSON blobs
run_config_kv — flattened key/value index for filtering
artifact_facet — artifact-level facet blobs
artifact_kv — flattened scalar key/value index for artifact filtering

Flattened keys use dot-notation; dots in raw key names are escaped as \..

tracker.find_runs_by_facet_kv(
    namespace="beam",
    key="memory_gb",
    value_num=180,
)

# Or fetch a full facet blob by ID:
facet = tracker.get_config_facet(facet_id)

Practical Example¶

consist.run(
    fn=my_model,
    config={"full_config_path": "activitysim_config.yaml"},  # (1)!
    facet={"year": 2030, "mode_choice_coefficient": 0.5},    # (2)!
    inputs={...},
    outputs=[...],
)

Hashed into cache key; changes trigger re-runs.
Queryable in DuckDB; does not affect the cache key.

This enables queries like mode_choice_coefficient > 0.4 without bloating the database with raw config files, while still invalidating the cache when the full config changes.

Identity Inputs¶

identity_inputs solves a specific problem: configuration files that must affect the cache key but are too large or unstructured to store as queryable metadata.

Use identity_inputs to include file or directory content in the run signature without logging those files as provenance inputs.

Files are hashed via the configured hashing_strategy.
Directories are hashed deterministically via a sorted walk.
Dotfiles are ignored by default.
Artifacts are intentionally excluded — identity_inputs is path-only.

When to Use Identity Inputs¶

Normally, large external configs fall into one of two unsatisfying patterns:

Approach	Space	Queryable	Cache behavior
`config={"yaml": large_dict}`	JSON snapshot only	No (by default)	Respects changes
Not tracked at all	None	No	Stale cache if files change

identity_inputs is the middle ground: hash the files so the cache key changes when they do, without storing the file content.

Example: ActivitySim

ActivitySim projects use a 5 MB+ directory of YAML and CSV config files. The config affects output, so changes must invalidate the cache. But the YAML is too large for a facet (exceeds 16 KB), unstructured, and not useful for database filtering.

asim_config_dir = Path("./configs/activitysim")

with use_tracker(tracker):
    result = consist.run(
        fn=run_activitysim,
        name="asim_baseline",
        config={"scenario": "baseline"},
        identity_inputs=[("asim_config", asim_config_dir)],  # (1)!
    )

The directory is hashed. If any file inside changes, the signature changes and ActivitySim re-runs.

Concise and Labeled Forms¶

Each entry in identity_inputs can be a bare path or a labeled tuple:

# Concise: bare paths (labels auto-derived from the project-relative path when possible)
identity_inputs = [config_root, coeffs_csv]

# Explicit: labeled tuples (stable names in identity summaries)
identity_inputs = [
    ("asim_config", config_root),
    ("tour_mode_coeffs", coeffs_csv),
]

Audit Trail¶

Even though file content is not stored, Consist records the hash for debugging:

run = tracker.get_run("asim_baseline_001")
print(run.meta["consist_hash_inputs"])
# {"asim_config": "a1b2c3..."}
print(run.identity_summary["identity_inputs"])

ConfigAdapters¶

ConfigAdapter is an integration plugin for tools with complex, file-based configuration — ActivitySim YAML, BEAM HOCON, MATSim XML, and similar. An adapter handles discovery, canonicalization, and optional ingestion of model-specific config into queryable tables.

Most users do not need to write or use a ConfigAdapter. If your config fits in a dict or Pydantic model, config= is sufficient.

When you do need one, pass it via adapter=:

from consist.integrations.activitysim import ActivitySimConfigAdapter

adapter = ActivitySimConfigAdapter(root_dirs=[overlay_dir])

consist.run(
    fn=run_activitysim,
    name="activitysim",
    config={"scenario": "baseline"},
    adapter=adapter,
    identity_inputs=[("asim_config", overlay_dir)],
)

For details on writing or using adapters, see the Config Adapters Integration Guide.

Examples¶

`run` with config and facet¶

import consist
from consist import use_tracker

with use_tracker(tracker):
    consist.run(
        fn=run_beam_step,
        name="beam",
        config=beam_cfg,
        facet={"memory_gb": beam_cfg.memory_gb},
        identity_inputs=[("beam_hocon", beam_cfg_path)],
    )

Scenario run with `facet_from`¶

with use_tracker(tracker):
    with consist.scenario("demo", tags=["travel"]) as sc:
        sc.run(
            fn=run_activitysim,
            name="activitysim",
            config=asim_cfg,
            facet_from=["scenario", "sample_rate"],
            identity_inputs=[("asim_yaml", asim_config_dir)],
        )

Config, Facets, and Identity Inputs¶

Decision Tree¶

Reference Table¶

The `config` Argument¶

Config Hashing¶

Facets¶

Guardrails¶

Querying Facets in DuckDB¶

Practical Example¶

Identity Inputs¶

When to Use Identity Inputs¶

Concise and Labeled Forms¶

Audit Trail¶

ConfigAdapters¶

Examples¶

`run` with config and facet¶

Scenario run with `facet_from`¶

See Also¶

Config, Facets, and Identity Inputs¶

Decision Tree¶

Reference Table¶

The config Argument¶

Config Hashing¶

Facets¶

Guardrails¶

Querying Facets in DuckDB¶

Practical Example¶

Identity Inputs¶

When to Use Identity Inputs¶

Concise and Labeled Forms¶

Audit Trail¶

ConfigAdapters¶

Examples¶

run with config and facet¶

Scenario run with facet_from¶

See Also¶

The `config` Argument¶

`run` with config and facet¶

Scenario run with `facet_from`¶