足ることを知らず

Data Science, global business, management and MBA

Day 67 in MIT Sloan Fellows Class 2023, Advanced Data Analytics and Machine Learning in Finance 5, Deep Learning - tuning tips

1. Become one with the data

  • The more time you spend looking at the data the better you'll do.
  • Find corrupt data, duplicates
    • Real example: lots of training examples that say "please enable javascript to view this page"
    • If 1% of your training data have some error mode, you'll need to look at ~1% of the data to find it
  • Label data yourself to get a sense for difficult the task is and where you fail

2.Set up an end to end training / evaluation pipeline and set benchmarks

  • We need code we are confident will accurately evaluate a model
  • Make dumb and less dumb baselines
    • constant guess
    • linear model
    • all zeros
  • Step (1) will have given us better intuition for what our benchmarks should be
  • This step reduces the surface area of neural network modeling efforts and reduces errors
  • Tips and tricks:
    • Fix a random seed (helps with reproducibility)
    • Don't try anything fancy (e.g. bagging a bunch of classifiers)
    • Add significant digits to evaluation code: Does a loss of .3002481 mean anything?
    • Verify the initial loss
    • Monitor human-interpretable metrics and compare them to human performance
    • Overfit on small amount of data: the network should be able to memorize it
      visualize your data right before it goes to the network (i.e. from the training data generator)

3. Overfit

  • Choose a model that is large enough to overfit the training data
    • training loss < validation loss
  • If you can't overfit something is wrong!
  • Tips and tricks
    • Picking a model: Don't be a hero, steal from other people.
    • The adam optimizer is a good choice
    • Add ONE piece of complexity at once. Don't simultaneous make every layer 2x larger and 2x the number of layers

4. Regularize

  • Once we can overfit, we can sacrifice some training loss for val loss.
  • Tips and tricks
    • Get more data. If possible, this is the easiest way to regularize
    • Augment your data: if you can't get more data, make up some fake data
    • Reduce the input dimension (e.g. vocab size)
    • Decrease model size
    • Weight decay
    • Dropout
    • Early stopping

5. Tune

  • Tune hyperparameters to optimize the network
  • Random search is better than grid search
  • There are fancy packages out there to help you

6. Squeeze the juice

  • Ensemble several models
  • Let it train for a long time

 

Reference

karpathy.github.io