How to keep your data quality in check and guard against data and model drift with Evidently profiling
pandas.DataFrame
or CSV file formats and can handle both regression and classification tasks.
You should use the Evidently Data Validator when you need the following data and/or model validation features that are possible with Evidently:
pandas.DataFrame
dataset or a pair of datasets and generate results in the form of a Profile
object containing all the relevant information, or as a Dashboard
visualization.
One of Evidently’s notable characteristics is that it only requires datasets as input. Even when running model performance comparison analyses, no model needs to be present. However, that does mean that the input data needs to include additional target
and prediction
columns for some profiling reports and, you have to include additional information about the dataset columns in the form of column mappings. Depending on how your data is structured, you may also need to include additional steps in your pipeline before the data validation step to insert the additional target
and prediction
columns into your data. This may also require interacting with one or more models.
There are three ways you can use Evidently in your ZenML pipelines that allow different levels of flexibility:
EvidentlyProfileStep
step. You select which reports you want to generate in your step by passing a list of string identifiers into the EvidentlyProfileParameters
:
Profile
object and a Dashboard
rendered as an HTML string:
zenml.integrations.evidently.steps.EvidentlyColumnMapping
objects, which have the exact same structure as evidently.pipeline.column_mapping.ColumnMapping
:
EvidentlyProfileConfig
step configuration also allows for additional profile options and dashboard options to be passed to the Profile
and Dashboard
constructors e.g.:
Profile
objects in its Artifact Store, e.g.: