Detect Train-Test Skew
We can use Evidently, one of our data validator stack components, to check for skew between our training and test datasets. To do so, we will define a new pipeline with an Evidently step, into which we will then pass our training and test datasets. At its core, Evidently’s distribution difference calculation functions take in a reference dataset and compare it with a separate comparison dataset. These are both passed in as pandas DataFrames, though CSV inputs are also possible. ZenML implements this functionality in the form of several standardized steps along with an easy way to use the visualization tools also provided along with Evidently as ‘Dashboards’. For data distribution comparison, we can simply use the predefined step of ZenML’s Evidently integration:To read a more detailed guide about how Data Validators function in ZenML,
click here.