APIs

run_step ¶

run_step(step: Step)

Run a pipeline step's tasks based on the availability of task files.

Tasks are iterated through, and the relevant in/output files' existence existence is checked when the task is reached in the loop (rather than at the start). This means that intermediate files can be created by tasks, and their existence will be checked when those output files become inputs to subsequent tasks.

If any task's required input files are missing, the step bails out: no further tasks will run.

models ¶

Pydantic models to represent the tasks within a step in a data pipeline.

C `module-attribute` ¶

C = TypeVar('C', bound=BaseModel)

AvailableTA `module-attribute` ¶

AvailableTA = TypeAdapter(list[OnErrorOmit[AvailableTask]])

CompletedTA `module-attribute` ¶

CompletedTA = TypeAdapter(list[OnErrorOmit[CompletedTask]])

Executable ¶

Bases: BaseModel

All tasks must have an associated function to make them executable.

fn `instance-attribute` ¶

fn: Callable

AvailableTask ¶

Bases: Executable

A task is available when its input files exist and its outputs don't.

src `instance-attribute` ¶

src: dict[str, FilePath]

dst `instance-attribute` ¶

dst: dict[str, NewPath]

CompletedTask ¶

Bases: Executable

A task is completed when its output files exist, whether inputs exist or not.

src `instance-attribute` ¶

src: dict[str, Path]

dst `instance-attribute` ¶

dst: dict[str, FilePath]

Task ¶

Bases: Executable

A task has zero or more input files and zero or more output files.

src `instance-attribute` ¶

src: dict[str, Path]

dst `instance-attribute` ¶

dst: dict[str, Path]

TaskRef ¶

Bases: Executable

A TaskRef is dereferenced to a Task by looking up src/dst fields on a config.

src `instance-attribute` ¶

src: list[str]

dst `instance-attribute` ¶

dst: list[str]

Step ¶

Bases: BaseModel

A named step in a data pipeline, split up into tasks with specified file I/O.

name `instance-attribute` ¶

name: str

task_refs `instance-attribute` ¶

task_refs: list[TaskRef]

config `instance-attribute` ¶

config: C

tasks `property` ¶

tasks: list[Task]

RunContext ¶

Bases: BaseModel

The context available to a task runner.

step `instance-attribute` ¶

step: Step

idx `instance-attribute` ¶

idx: int

run ¶

Control flow using the Pydantic runtime file I/O checks.

task_runner ¶

task_runner(task: AvailableTask, context: RunContext) -> None

run_step ¶

run_step(step: Step)

Run a pipeline step's tasks based on the availability of task files.

Tasks are iterated through, and the relevant in/output files' existence existence is checked when the task is reached in the loop (rather than at the start). This means that intermediate files can be created by tasks, and their existence will be checked when those output files become inputs to subsequent tasks.

If any task's required input files are missing, the step bails out: no further tasks will run.

APIs

run_step ¶

models ¶

C module-attribute ¶

AvailableTA module-attribute ¶

CompletedTA module-attribute ¶

Executable ¶

fn instance-attribute ¶

AvailableTask ¶

src instance-attribute ¶

dst instance-attribute ¶

CompletedTask ¶

src instance-attribute ¶

dst instance-attribute ¶

Task ¶

src instance-attribute ¶

dst instance-attribute ¶

TaskRef ¶

src instance-attribute ¶

dst instance-attribute ¶

Step ¶

name instance-attribute ¶

task_refs instance-attribute ¶

config instance-attribute ¶

tasks property ¶

RunContext ¶

step instance-attribute ¶

idx instance-attribute ¶

run ¶

task_runner ¶

run_step ¶

C `module-attribute` ¶

AvailableTA `module-attribute` ¶

CompletedTA `module-attribute` ¶

fn `instance-attribute` ¶

src `instance-attribute` ¶

dst `instance-attribute` ¶

src `instance-attribute` ¶

dst `instance-attribute` ¶

src `instance-attribute` ¶

dst `instance-attribute` ¶

src `instance-attribute` ¶

dst `instance-attribute` ¶

name `instance-attribute` ¶

task_refs `instance-attribute` ¶

config `instance-attribute` ¶

tasks `property` ¶

step `instance-attribute` ¶

idx `instance-attribute` ¶