How to schedule a pipeline run
ZenML’s scheduling functionality rests on the use of aSchedule
object that you pass in when calling pipeline.run()
. There are two ways to create a schedule with the Schedule
object, though whether one or both of these are supported depends on the specific orchestrator you’re using. For example, our Vertex Orchestrator only supports the cron expression method (see below).
You could write a cron expression to describe the pipeline schedule in terms that would be comprehensible as a cron job. For example, if you wanted your pipeline to run at 14:05 on Wednesdays, you could use the following:
end_time
for the schedule to prevent it from running after a certain time. The catchup
parameter, which is a boolean, can be used to specify whether a recurring run should catch up (i.e. backfill pipeline runs) on missed runs if it has fallen behind schedule. This can happen, for example, if you paused the schedule.
In the context of scheduled cron or pipeline jobs, backfilling refers to running a missed job for a specific period in the past. For example, if a pipeline misses a scheduled run at 12:00 PM, backfilling can be used to run the pipeline for the 12:00 PM time slot, in order to collect the missing data. This helps ensure that the pipeline is up-to-date and that downstream jobs have the necessary data to run correctly. Backfilling is a useful technique for catching up on missed runs and filling in gaps in scheduled jobs, and can help ensure that pipelines and cron schedules are running smoothly. Usually, if your pipeline handles backfill internally, you should turn catchup off to avoid duplicate backfill. Note that the catchup
parameter enabling backfilling is not supported in all orchestrators.
Here’s a handy
guide
in the context of Airflow.
How to stop or pause a scheduled run
The way pipelines are scheduled depends on the orchestrator you are using. For example, if you are using Kubeflow, you can use the Kubeflow UI to stop or pause a scheduled run. If you are using Airflow, you can use the Airflow UI to do the same. However, the exact steps for stopping or pausing a scheduled run may vary depending on the orchestrator you are using. We recommend consulting the documentation for your orchestrator to learn the current method for stopping or pausing a scheduled run. Note that ZenML only gets involved to schedule a run, but maintaining the lifecycle of the schedule (as explained above) is the responsibility of the user. If you run a pipeline containing a schedule two times, two scheduled pipelines (with different/unique names) will be created in whatever orchestrator you’re using, so in that sense it’s on you to stop or pause the schedule as is appropriate.Supported Orchestrators
Orchestrator | Scheduling Support |
---|---|
LocalOrchestrator | ⛔️ |
LocalDockerOrchestrator | ⛔️ |
KubernetesOrchestrator | ✅ |
KubeflowOrchestrator | ✅ |
VertexOrchestrator | ✅ |
TektonOrchestrator | ⛔️ |
AirflowOrchestrator | ✅ |
GitHubActionsOrchestrator | ✅ |