Skip to content

Views

View Registry

Registry for dynamic view classes. Accessing a view (e.g. registry.Person) automatically refreshes the underlying DuckDB SQL definition to include new files.

Use register(model, key=...) to add SQLModel schemas. Accessing the attribute returns a dynamic SQLModel view class that can be queried via select(...).

register(model, key=None)

View Factory

A factory class responsible for generating "Hybrid Views" in DuckDB, acting as Consist's "The Virtualizer" component.

Hybrid Views combine data from materialized tables (often ingested via dlt) with data directly from file-based artifacts (e.g., Parquet, CSV), providing a unified SQL interface to query both "hot" and "cold" data transparently. This approach is central to Consist's flexible data access strategy.

Attributes:

Name Type Description
tracker Tracker

An instance of the Consist Tracker, which provides access to the database engine, artifact resolution, and other run-time context necessary for view creation.

create_view_from_model(model, key=None)

Creates both the SQL View and the Python SQLModel class for a given schema.

create_hybrid_view(view_name, concept_key, driver_filter=None, schema_model=None)

Creates or replaces a DuckDB SQL VIEW that combines "hot" and "cold" data for a given concept.

This method generates a "Hybrid View" which allows transparent querying across different data storage types. It implements "View Optimization" by leveraging DuckDB's capabilities for vectorized reads from files. The resulting view uses UNION ALL BY NAME to gracefully handle "Schema Evolution" (different columns across runs or data sources) by nulling out missing columns.

"Hot" data refers to records already materialized into a DuckDB table (e.g., via ingestion). "Cold" data refers to records still residing in file-based artifacts (e.g., Parquet, CSV). Identifiers are quoted for SQL safety; missing cold-file paths are skipped at view creation.

Parameters:

Name Type Description Default
view_name str

The name to assign to the newly created or replaced SQL view. This is the name you will use in your SQL queries to access the combined data.

required
concept_key str

The semantic key identifying the data concept (e.g., "households", "transactions"). Artifacts and materialized tables matching this key will be included in the view.

required
driver_filter Optional[List[str]]

An optional list of artifact drivers (e.g., "parquet", "csv") to include when querying "cold" data. If None, "parquet" and "csv" drivers are considered by default.

None
schema_model Type[SQLModel]

SQL table definition for underlying data

None

Returns:

Type Description
bool

True if the view creation was attempted (even if the view ends up empty), False otherwise.

Raises:

Type Description
RuntimeError

If the Tracker's database engine is not configured (i.e., db_path was not provided during Tracker initialization).

create_grouped_hybrid_view(*, view_name, schema_id=None, schema_ids=None, schema_compatible=False, predicates=None, namespace=None, drivers=None, attach_facets=None, include_system_columns=True, mode='hybrid', if_exists='replace', missing_files='warn', run_id=None, parent_run_id=None, model=None, status=None, year=None, iteration=None)

Create a selector-driven hybrid view across many artifacts.

This method powers schema-family analysis views where artifacts may have different keys but represent the same logical table. Selection is based on a required schema_id plus optional facet/run predicates.

The resulting SQL view can combine: - hot rows from ingested global_tables.* relations, - cold rows from files (currently parquet/csv readers), - optional typed facet_* projection columns, - optional Consist system columns.

Parameters:

Name Type Description Default
view_name str

Name of the SQL view to create.

required
schema_id Optional[str]

Primary selector for a single schema id.

None
schema_ids Optional[List[str]]

Alternative selector for multiple schema ids. This is mainly used by higher-level model-class resolution in Tracker.create_grouped_view.

None
schema_compatible bool

If True, include artifacts observed with schema variants deemed compatible by field-name subset/superset matching.

False
predicates Optional[List[Dict[str, Any]]]

Parsed ArtifactKV predicates (as produced by Tracker._parse_artifact_param_expression).

None
namespace Optional[str]

Default ArtifactKV namespace used when a predicate does not provide one explicitly.

None
drivers Optional[List[str]]

Optional artifact-driver filter (e.g., ["parquet"]).

None
attach_facets Optional[List[str]]

Facet key paths to expose as typed columns named facet_<key>.

None
include_system_columns bool

If True, include consist_run_id, consist_artifact_id, consist_year, consist_iteration, and consist_scenario_id.

True
mode (hybrid, hot_only, cold_only)

Controls which storage tier(s) are included in the view.

"hybrid"
if_exists (replace, error)

View creation behavior when view_name already exists.

"replace"
missing_files (warn, error, skip_silent)

Policy for selected cold artifacts whose files no longer exist.

"warn"
run_id Optional[str]

Optional exact run-id filter.

None
parent_run_id Optional[str]

Optional parent/scenario run-id filter.

None
model Optional[str]

Optional run model-name filter.

None
status Optional[str]

Optional run status filter.

None
year Optional[int]

Optional run year filter.

None
iteration Optional[int]

Optional run iteration filter.

None

Returns:

Type Description
bool

True when view creation is completed.

Raises:

Type Description
RuntimeError

If the tracker is not configured with a database/engine.

ValueError

If policy arguments have unsupported values, or if_exists="error" and the view already exists.

FileNotFoundError

If missing_files="error" and a selected cold file is missing.

Notes
  • Empty selections still produce a valid typed empty view.
  • Facet column types are inferred deterministically from indexed KV types: bool -> BOOLEAN, int -> BIGINT, float/int mix -> DOUBLE, otherwise VARCHAR.