Skip to content

Artifact

Artifact represents a tracked input or output file with stable provenance metadata (run_id, hash, container_uri, driver, and custom metadata).

When to use Artifact directly

  • You want to pass outputs from one step to another (inputs={...} mappings).
  • You need a portable path via artifact.path rather than hard-coded file paths.
  • You need metadata checks (artifact.get_meta(...), artifact.is_tabular, artifact.is_matrix) before loading.

Minimal runnable example

from pathlib import Path
import consist
from consist import Tracker

tracker = Tracker(run_dir="./runs", db_path="./provenance.duckdb")

def write_output() -> Path:
    out = consist.output_path("report", ext="txt")
    out.write_text("hello\n")
    return out

with consist.use_tracker(tracker):
    result = consist.run(fn=write_output, outputs=["report"])

artifact = result.outputs["report"]
print(artifact.key)
print(artifact.path)
print(artifact.path.read_text().strip())

See API Helpers for helper functions that return and consume artifacts (consist.run, consist.ref, consist.refs, consist.load*).

Bases: SQLModel

Represents a physical data object in the Consist database.

This table stores canonical metadata for any file/dataset Consist tracks. It is linked to runs via run_artifact_link to record whether an artifact was an input or output. The run_id field records the producing run (if any) and is often None for external inputs.

Artifacts are the core building blocks of provenance and caching. Each artifact has a unique identity, a virtualized location, and rich metadata, supporting both "hot" (ingested) and "cold" (file-based) data strategies.

Attributes: id (uuid.UUID): A unique identifier for the artifact. key (str): A semantic, human-readable name for the artifact (e.g., "households", "parcels"). container_uri (str): A portable, virtualized Uniform Resource Identifier (URI) for the artifact's location (e.g., "inputs://land_use.csv"). table_path (Optional[str]): Optional path inside a container (e.g., "/tables/households"). array_path (Optional[str]): Optional path inside a container for array artifacts. driver (str): The name of the format handler used to read or write the artifact (e.g., "parquet", "csv", "zarr"). hash (Optional[str]): SHA256 content hash of the artifact's data, enabling content-addressable lookups and deduplication. run_id (Optional[str]): The ID of the run that generated this artifact. Null for inputs. meta (Dict[str, Any]): A flexible JSON field for storing arbitrary metadata, such as schema signatures, or data dimensions. created_at (datetime): The timestamp when the artifact was first logged.

abs_path property writable

Runtime-only helper to access the absolute path of this artifact.

This property provides the resolved absolute file system path for the artifact. It is not persisted to the database but is crucial for local file operations and for chaining Consist runs within the same script or environment.

Returns:

Type Description
Optional[str]

The absolute file system path of the artifact, or None if it has not yet been resolved or set.

path property

Resolve this artifact to a filesystem Path.

Uses the tracker when available to handle mount-aware URIs; otherwise falls back to the cached absolute path or the raw URI.

is_matrix property

Indicates if the artifact represents a multi-dimensional array or matrix-like data.

This property helps in dispatching to appropriate data loaders or processing functions that handle array-based data structures, such as those typically found in scientific computing.

Returns:

Type Description
bool

True if the artifact's driver is associated with matrix-like data formats (e.g., Zarr, HDF5, NetCDF, OpenMatrix), False otherwise.

is_tabular property

Indicates if the artifact represents tabular data (rows and columns).

This property assists in identifying artifacts that can be loaded and processed using tools designed for structured, record-based data, such as Pandas DataFrames.

Returns:

Type Description
bool

True if the artifact's driver is associated with tabular data formats (e.g., Parquet, CSV, SQL), False otherwise.

created_at_iso property

Return created_at as an ISO 8601 formatted string.

Useful for serialization to external systems that expect string timestamps.

Returns:

Type Description
Optional[str]

The created_at timestamp as ISO 8601 string, or None if not set.

get_meta(key, default=None)

Safely retrieves a value from the 'meta' dictionary.

Args: key (str): The key to look up in the metadata. default (Any, optional): The default value to return if the key is not found. Defaults to None.

Returns: Any: The value associated with the key, or the default value if the key is not present.