Schema Export (SQLModel Stubs)¶
Consist can capture the observed (post-ingest) schema of tabular artifacts and export that schema as a static SQLModel class stub you can commit and edit. This reduces “retype the table definition” friction while keeping Consist honest about what it can and cannot infer.
What You Get¶
- A Python 3.11 file containing one SQLModel class stub (exported as
table=True). - Columns with:
- a best-effort type mapping from DuckDB logical types
- nullability (
Optional[...] = Noneonly when nullable) - deterministic ordering (prefers
ordinal_position) - “Hints” as comments (enums/stats) when available (on by default).
- Messy column names are handled: the generated attribute name is sanitized, but the original DB column name is preserved via
Field(sa_column=Column("original", ...))when needed. - Foreign keys are preserved if they were provided in a curated SQLModel schema (rendered as
Field(foreign_key="table.column")).
Typical Workflow¶
- Ingest tabular data into DuckDB via
log_dataframe(..., schema=MySchema)or enable lightweight file profiling withprofile_file_schema=True. Consist captures the table structure automatically. - Export the captured schema as a SQLModel stub (
consist.cli schema export ...) and edit it to add primary keys, relationships, and normalization decisions. - Register the curated model with a
Trackerto activate a hybrid view for cross-run queries.
Abstract vs Concrete Exports¶
By default, Consist exports SQLModel classes as abstract (__abstract__ = True). This keeps the stub importable immediately, because SQLAlchemy requires a primary key to map a concrete table/view, and many analysis tables don’t have an obvious primary key.
- Default (recommended): abstract export, safe to import, great for view schemas.
- Concrete export (advanced): export with
--concrete(orabstract=Falsein Python) and then add a primary key before importing, otherwise SQLAlchemy will raise an error like “could not assemble any primary key columns”.
What You Still Do Manually¶
Consist does not infer semantic intent. You typically edit the stub to add:
- primary keys / foreign keys
- indexes / uniqueness constraints
- relationships
- renames / normalization decisions
If you pass a curated SQLModel schema to log_artifact(..., schema=YourModel) and it includes foreign_key=..., those FKs are persisted and will be re-emitted during export.
Prerequisites¶
Schema export uses the schema captured from a DuckDB-ingested table. In practice:
- You run with a database configured (
db_path=...). - You ingest tabular data into DuckDB (hot data).
- Consist profiles the ingested table and stores a deduped schema referenced by
artifact.meta["schema_id"].
If you already see schema_id in an artifact’s meta, you’re ready to export.
You can also capture schemas without ingestion for CSV/Parquet artifacts by enabling lightweight file profiling at log time (profile_file_schema=True, optional file_schema_sample_rows=). This writes the same schema_id pointer into artifact.meta, allowing schema export even if the original file is later missing or moved.
If an artifact was already logged without schema profiling, you can backfill file schema metadata later (as long as the file is still accessible):
Use --artifact-id for UUID-based selection, and --path to override path
resolution when the stored URI no longer resolves directly.
Exporting from the CLI¶
Export by artifact UUID (recommended UX for now):
python -m consist.cli schema export \
--artifact-id 00000000-0000-0000-0000-000000000000 \
--out your_pkg/models/persons.py
Or export by schema id (hash):
Useful flags:
--class-name Personsto override the generated class name--table-name personsto override__tablename__--include-system-colsto include system/ingestion columns likeconsist_*and_dlt_*--no-stats-commentsto omit stats/enum hint comments--concreteto export a non-abstract mapped class (you must add a primary key)
If --out is omitted, the stub is printed to stdout (so it can be piped or redirected).
Exporting from Python¶
You can generate the code (and optionally write it) from a Tracker:
from pathlib import Path
from consist import Tracker
tracker = Tracker(run_dir=".", db_path="provenance.duckdb")
code = tracker.export_schema_sqlmodel(
artifact_id="00000000-0000-0000-0000-000000000000",
out_path=Path("your_pkg/models/persons.py"),
abstract=True,
)
The method returns the generated string regardless of whether you write it.
Best‑Effort Foreign Key Enforcement¶
If you want DuckDB to attempt to enforce persisted foreign keys, run:
This step is best‑effort: if a constraint can’t be added (e.g., due to missing parent rows), Consist logs a warning and continues.
Where to Put the Generated File¶
Place it anywhere importable by your project (on PYTHONPATH), for example:
your_pkg/models/your_pkg/schemas/
If it’s a package directory, include an __init__.py so you can import your_pkg.models.persons.
Consist does not require a special directory layout; it only needs you to import the class and register it for views.
Using the Generated Model with Views¶
Consist’s hybrid views are created from your SQLModel schema. To activate a view:
1) Import the model class.
2) Register it with the Tracker (at init time or after).
Register at Tracker initialization¶
from consist import Tracker
from your_pkg.models.persons import Persons
tracker = Tracker(run_dir="./runs", db_path="./provenance.duckdb", schemas=[Persons])
VPersons = tracker.views.Persons
Register later¶
from your_pkg.models.persons import Persons
VPersons = tracker.view(Persons) # registers + creates/refreshes the hybrid view
Querying¶
from sqlmodel import select
import consist
VPersons = tracker.views.Persons
rows = consist.run_query(select(VPersons).limit(5), tracker=tracker)
Views automatically include Consist metadata columns (e.g. consist_run_id, consist_year, consist_scenario_id) for filtering and grouping.
Column Name Rules (Important)¶
The generated stub aggressively normalizes attribute names to reduce runtime errors:
- invalid characters become
_ - leading digits are prefixed with
_ - Python keywords get a trailing
_ - collisions become
__2,__3, ...
When the attribute name differs from the real column name, the stub uses:
This matters for views and empty-view typing: Consist now uses the original DB column name when creating typed empty views, so schemas with renamed attributes still behave correctly.
Matching Artifacts, Tables, and __tablename__¶
For views to “pick up” your data, the model’s __tablename__ should match the concept key you are querying:
- for ingested (“hot”) data, this is typically the DuckDB table name used at ingest time
- for file-based (“cold”) data, this is typically the
Artifact.key
If you want your curated model name to differ from the ingestion key, you can:
- override
--table-name/table_name=...during export, or - set
__tablename__ = "..."manually after editing
System and Ingestion Columns¶
By default, exported stubs omit system/ingestion columns:
consist_*(Consist provenance columns)_dlt_*(common ingestion/system columns)
Use --include-system-cols if you need a faithful “as-ingested” representation.
Notes on Wide / Sparse Tables¶
Very wide tables can cause the stored JSON profile to truncate. Consist still persists per-field rows so schema export continues to work, and emits a warning when truncation occurs.
If your upstream workflow produces sparse wide tables, consider reshaping to a long format before ingestion (or as a dedicated transformation step). Future Consist work may add first-class “pre/post ingestion transforms”, but schema export is designed to work even when the JSON blob is truncated.
See Also¶
- DLT Loader Integration Guide — Ingesting tabular data so schemas are captured
- Architecture: Data Virtualization — How hybrid views work
- Data Materialization Strategy