Fetching Pipelines
How to inspect a finished pipeline run
Once a pipeline run has completed, we can access it using the post-execution utilities.
Each pipeline can have multiple runs associated with it, and for each run there might be several outputs for each step. Thus, to inspect a specific output, we first need to access the respective pipeline, then fetch the respective run, and then choose the step output of that specific run.
The overall hierarchy looks like this:
Let us investigate how to traverse this hierarchy level by level:
Pipelines
ZenML keeps a collection of all created pipelines with at least one run sorted by the time of their first run from oldest to newest.
You can either access this collection via the get_pipelines()
method or query a specific pipeline by name using get_pipeline(pipeline=...)
:
Be careful when accessing pipelines by index. Even if you just ran a pipeline
it might not be at index -1
, due to the fact that the pipelines are sorted
by time of first run. Instead, it is recommended to access the pipeline
using the pipeline class, an instance of the class or even the name of the
pipeline as a string: get_pipeline(pipeline=...)
.
Runs
Getting runs from a fetched pipeline
Each pipeline can be executed many times. You can get a list of all runs using the runs
attribute of a pipeline.
Getting runs from a pipeline instance:
Alternatively, you can also access the runs from the pipeline class/instance itself:
Directly getting a run
Finally, you can also access a run directly with the get_run(run_name=...)
:
Runs Configuration
Each run has a collection of useful metadata which you can access to ensure all runs are reproducible:
git_sha
The Git commit SHA that the pipeline run was performed on. This will only be set if the pipeline code is in a git repository and there are no uncommitted files when running the pipeline.
status
The status of a pipeline run can also be found here. There are four possible states: failed, completed, running, cached:
pipeline_configuration
The pipeline_configuration
is a super object that contains all configuration of the pipeline and pipeline run, including pipeline-level BaseSettings, which we will learn more about later. You can also access the settings directly via the settings
variable.
docstring
The docstring of the step.
Steps
Within a given pipeline run you can now further zoom in on individual steps using the steps
attribute or by querying a specific step using the get_step(step=...)
method.
The step name
refers to the pipeline attribute and not the class name of the
steps that implement the step for a pipeline instance.
The steps are ordered by time of execution. Depending on the
orchestrator, steps can be run in
parallel. Thus, accessing steps by index can be unreliable across different
runs, and it is recommended to access steps by the step class, an instance of
the class or even the name of the step as a string: get_step(step=...)
instead.
Similar to the run, for reproducibility, you can use the step
object to access:
- BaseParameters via
step.parameters
: The parameters used to run the step. - Step-level BaseSettings via
step.step_configuration
- Input and output artifacts.
Outputs
Finally, this is how you can inspect the output of a step:
- If there only is a single output, use the
output
attribute - If there are multiple outputs, use the
outputs
attribute, which is a dictionary that can be indexed using the name of an output:
The names of the outputs can be found in the Output
typing of your steps:
Code Example
Putting it all together, this is how we can access the output of the last step of our example pipeline from the previous sections:
or alternatively:
Final note: Fetching older pipeline runs within a step
While most of this document has been focusing on the so called post-execution workflow (i.e. fetching objects after a pipeline has completed), it can also be used within the context of a running pipeline.
This is often desirable in cases where a pipeline is running continously over time and decisions have to be made according to older runs.
E.g. Here, we fetch from within a step the last pipeline run for the same pipeline:
You can get a lot more metadata within a step as well, something we’ll learn in more detail in the advanced docs.