Video Tutorial
Configuring pipelines, steps, and stack components in ZenML
Settings in ZenML
As discussed in a previous chapter, there are two ways to configure anything in ZenML:BaseParameters
: Runtime configuration passed down as a parameter to step functions.BaseSettings
: Runtime settings passed down to stack components and pipelines.
BaseSettings
.
What can be configured?
Looked at one way,BaseParameters
configure steps within a pipeline to behave in a different way during runtime. But what other things can be configured at runtime? Here is a list:
- The resources of a step.
- Configuring the containerization process of a pipeline (e.g. What requirements get installed in the Docker image).
- Stack component specific configuration, e.g., if you have an experiment tracker passing in the name of the experiment at runtime.
BaseSettings
(From here on, we use settings
and BaseSettings
as analogous in this guide).
Types of settings
Settings are categorized into two types:-
General settings that can be used on all ZenML pipelines. Examples of these are:
- DockerSettings to specify docker settings.
- ResourceSettings to specify resource settings.
-
Stack component specific settings: These can be used to supply runtime configurations to certain stack components (key= <COMPONENT_CATEGORY>.<COMPONENT_FLAVOR>). Settings for components not in the active stack will be ignored. Examples of these are:
- KubeflowOrchestratorSettings to specify Kubeflow settings.
- MLflowExperimentTrackerSettings to specify MLflow settings.
- WandbExperimentTrackerSettings to specify W&B settings.
- WhylogsDataValidatorSettings to specify Whylogs settings.
zenml stack-component register <NAME> --config1=configvalue --config2=configvalue
etc. The answer is that the configuration passed in at registration time is static and fixed throughout all pipeline runs, while the settings can change.
A good example of this is the MLflow Experiment Tracker, where configuration which remains static such as the tracking_url
is sent through at registration time, while runtime configuration such as the experiment_name
(which might change every pipeline run) is sent through as runtime settings.
Even though settings can be overridden at runtime, you can also specify default values for settings while configuring a stack component. For example, you could set a default value for the nested
setting of your MLflow experiment tracker: zenml experiment-tracker register <NAME> --flavor=mlflow --nested=True
This means that all pipelines that run using this experiment tracker use nested MLflow runs unless overridden by specifying settings for the pipeline at runtime.
Stack Component Config vs Settings in ZenML
Using objects or dicts
Settings can be passed in directly as BaseSettings-subclassed objects, or a dict-representation of the object. For example, a Docker configuration can be passed in as follows:How to use settings
Method 1: Directly on the decorator
The most basic way to set settings is through thesettings
variable that exists in both @step
and @pipeline
decorators:
Once you set settings on a pipeline, they will be applied to all steps with
some exception. See the later section on precedence for more
details.
Method 2: On the step/pipeline instance
This is exactly the same as passing it through the decorator, but if you prefer you can also pass it in theconfigure
methods of the pipeline and step instances:
Method 3: Configuring with YAML
As all settings can be passed through as a dict, users have the option to send all configuration in via a YAML file. This is useful in situations where code changes are not desirable. To use a YAML file, you must pass it in therun
method of a pipeline instance:
steps
. Here is rough skeleton of a valid YAML config. All keys are optional.
/local/path/to/config.yaml
with a commented out YAML file with all possible options that the pipeline instance can take.
Here is an example of a YAML config file generated from the above method:
An example of a YAML config
An example of a YAML config
Some configuration is commented out as it is not needed.
The extra
dict
You might have noticed another dict that is available to pass through to steps and pipelines called extra
. This dict is meant to be used to pass any configuration down to the pipeline, step, or stack components that the user has use of.
An example of this is if I want to tag a pipeline, I can do the following: