Skip to content

APIs

run_step

run_step(step: Step)

Run a pipeline step's tasks based on the availability of task files.

Tasks are iterated through, and the relevant in/output files' existence existence is checked when the task is reached in the loop (rather than at the start). This means that intermediate files can be created by tasks, and their existence will be checked when those output files become inputs to subsequent tasks.

If any task's required input files are missing, the step bails out: no further tasks will run.

models

Pydantic models to represent the tasks within a step in a data pipeline.

C module-attribute

C = TypeVar('C', bound=BaseModel)

AvailableTA module-attribute

AvailableTA = TypeAdapter(list[OnErrorOmit[AvailableTask]])

CompletedTA module-attribute

CompletedTA = TypeAdapter(list[OnErrorOmit[CompletedTask]])

Executable

Bases: BaseModel

All tasks must have an associated function to make them executable.

fn instance-attribute

fn: Callable

AvailableTask

Bases: Executable

A task is available when its input files exist and its outputs don't.

src instance-attribute

src: dict[str, FilePath]

dst instance-attribute

dst: dict[str, NewPath]

CompletedTask

Bases: Executable

A task is completed when its output files exist, whether inputs exist or not.

src instance-attribute

src: dict[str, Path]

dst instance-attribute

dst: dict[str, FilePath]

Task

Bases: Executable

A task has zero or more input files and zero or more output files.

src instance-attribute

src: dict[str, Path]

dst instance-attribute

dst: dict[str, Path]

TaskRef

Bases: Executable

A TaskRef is dereferenced to a Task by looking up src/dst fields on a config.

src instance-attribute

src: list[str]

dst instance-attribute

dst: list[str]

Step

Bases: BaseModel

A named step in a data pipeline, split up into tasks with specified file I/O.

name instance-attribute

name: str

task_refs instance-attribute

task_refs: list[TaskRef]

config instance-attribute

config: C

tasks property

tasks: list[Task]

RunContext

Bases: BaseModel

The context available to a task runner.

step instance-attribute

step: Step

idx instance-attribute

idx: int

run

Control flow using the Pydantic runtime file I/O checks.

task_runner

task_runner(task: AvailableTask, context: RunContext) -> None

run_step

run_step(step: Step)

Run a pipeline step's tasks based on the availability of task files.

Tasks are iterated through, and the relevant in/output files' existence existence is checked when the task is reached in the loop (rather than at the start). This means that intermediate files can be created by tasks, and their existence will be checked when those output files become inputs to subsequent tasks.

If any task's required input files are missing, the step bails out: no further tasks will run.