Hasan's Post

Tutorial repository

View on GitHub
21 November 2022

Validation strategy and its related problems

by Hasan

Validation and overfitting

Concept of validation

Overfitting and underfitting

overfitting

overfitting

Validation Strategies

Holdout validation

flowchart TB
    A[data] --> B[Validation data]
    A[data] --> C[Training data]
from sklearn.model_selection import ShuffleSplit

K-fold validation

flowchart TB
    A[data] --> B[Fold 1]
    B[Fold 1] --> C[Training data]
    B[Fold 1] --> D[Validation data]

    A[data] --> E[Fold 2]
    E[Fold 2] --> F[Training data]
    E[Fold 2] --> G[Validation data]

    A[data] --> H[Fold 3]
    H[Fold 3] --> I[Training data]
    H[Fold 3] --> J[Validation data]

    C(Training data) --> K[Average]
      D(Training data) --> K[Average]
    F(Training data) --> K[Average]
      G(Training data) --> K[Average]
    I(Training data) --> K[Average]
      J(Training data) --> K[Average]

    K(Average) --> L[Final model performance]
sklearn.model_selection.KFold

Leave-one-out validation

Iterate over samples: retrain the model on all samples except current sample, predict for the current sample. You will need to retrain the model N times (if N is the number of samples in the dataset).

In the end you will get LOO predictions for every sample in the trainset and can calculate loss.

Stratification


Notice, that these are validation schemes are supposed to be used to estimate quality of the model. When you found the right hyper-parameters and want to get test predictions don’t forget to retrain your model using all training data


Most frequent ways to generate train-test split

Problems occurinng during validation

a. Problem in validation stage. b. Problem in leader -board or when in production

a. Problem in validation stage

Solution

b. Problem in leader-board or when in production

  1. During serving the score is constantly higher, lower than validaiton score
  2. During serving the score is not correlated with validation score
  1. As we already know, we have different type of score in differnt validation. So we can assume leader-board/production is another type of validation. If we have different score on different fold and differnt result on leader-board/production result is not surprising. We can calculate mean and standard deviation of validaiton scores and estimate if the leader-board/produciton score is expected or not.

  2. If this is not the case then there is something which is wrong.

    a. Too little data, which ok. Trust your validation and it will be ok. b. Train and test are from differnt distribution. (here train test means train and production)

    Train and test distribution difference

    • If the train data just see the men and test data just see women, so model will have problem in production. So we need to make sure that train and test data are from same distribution.

    • Solution:

      • We need to find a way to tackle differnt distribution in train and test. Sometimes it is possible by adjusting during training. But sometimes it is only possible to adjust production/leader-board data.
      • In this particular solution, we can some how figure out a constant from train and test data
        • Mean from train data
        • Mean from test data
      • Then shift the prediciton with this difference.
      • Normally this type of case is rare. However the most frequent type of problem is following.

      leaderboard_problem

      • Then as we said, try to create a validation set , which reflects the test data. So we need to create a similar distribution in validation data. Then we can find out the problem.

      minimize_difference


Useful link

How(and why) to create a good validation set sklearn doc Advices on kaggle validation

tags: