Skip to content

Materialization

MaterializationResult dataclass

materialize_artifacts(tracker, items, *, on_missing='warn')

Synchronize cached artifact bytes to specific filesystem destinations.

This function performs physical materialization by copying the resolved source data of cached artifacts to the requested locations. It ensures atomic file writes and safe directory recursion. If the destination already contains matching data, the operation is bypassed to prevent redundant I/O.

Parameters:

Name Type Description Default
tracker Tracker

The active Tracker instance used to resolve virtualized artifact URIs to physical host filesystem paths.

required
items Sequence[tuple[Artifact, Path]]

A collection of (Artifact, Path) pairs defining the source artifacts and their respective target destinations.

required
on_missing ('warn', 'raise')

The error handling policy for cases where the resolved source path is absent from the filesystem.

"warn"

Returns:

Type Description
dict[str, str]

A mapping of artifact keys to their successfully materialized absolute filesystem paths.

materialize_artifacts_from_sources(items, *, allowed_base, on_missing='warn')

Rehydrate artifacts from explicit source paths to specified destinations.

This specialized materialization utility is typically employed during cross-run hydration (e.g., 'inputs-missing' mode) where the original on-disk source is located in a historical run directory. It enforces directory containment security via the allowed_base parameter.

Parameters:

Name Type Description Default
items Sequence[tuple[Artifact, Path, Path]]

A collection of (Artifact, SourcePath, DestinationPath) triples defining the explicit rehydration mapping.

required
allowed_base Path | None

A security boundary for the materialization. If provided, all destination paths must reside within this directory tree.

required
on_missing ('warn', 'raise')

The error handling policy for cases where the explicit source path is absent from the filesystem.

"warn"

Returns:

Type Description
dict[str, str]

A mapping of artifact keys to their successfully materialized absolute filesystem paths.

build_materialize_items_for_keys(outputs, *, destinations_by_key)

Construct materialization mappings by correlating artifacts with target keys.

This helper facilitates the preparation of materialization payloads by matching a collection of candidate artifacts against a set of desired destination keys.

Parameters:

Name Type Description Default
outputs Iterable[Artifact]

The collection of candidate artifacts available for materialization.

required
destinations_by_key dict[str, Path]

A mapping of artifact keys to their intended target filesystem paths.

required

Returns:

Type Description
list[tuple[Artifact, Path]]

A list of (Artifact, Path) pairs ready for the materialize_artifacts utility.

materialize_ingested_artifact_from_db(*, artifact, tracker, destination, overwrite=False)

Reconstruct an ingested artifact from the analytical database.

This recovery mechanism is utilized when a cached artifact is required but its physical materialization has been purged from the filesystem. If the artifact was previously ingested, its data is exported from DuckDB to the specified destination.

Parameters:

Name Type Description Default
artifact Artifact

The metadata record of the artifact to be reconstructed.

required
tracker Tracker

The active Tracker instance containing the analytical engine and database connection.

required
destination Path

The target filesystem path where the reconstructed data will be materialized.

required

Returns:

Type Description
str

The absolute filesystem path of the reconstructed artifact.

Raises:

Type Description
RuntimeError

If the database engine is unavailable or the source table cannot be resolved.

ValueError

If the artifact is not marked as ingested or uses an unsupported materialization driver.