1. Become one with the data
- The more time you spend looking at the data the better you'll do.
- Find corrupt data, duplicates
- Real example: lots of training examples that say "please enable javascript to view this page"
- If 1% of your training data have some error mode, you'll need to look at ~1% of the data to find it
- Label data yourself to get a sense for difficult the task is and where you fail
2.Set up an end to end training / evaluation pipeline and set benchmarks
- We need code we are confident will accurately evaluate a model
- Make dumb and less dumb baselines
- constant guess
- linear model
- all zeros
- Step (1) will have given us better intuition for what our benchmarks should be
- This step reduces the surface area of neural network modeling efforts and reduces errors
- Tips and tricks:
- Fix a random seed (helps with reproducibility)
- Don't try anything fancy (e.g. bagging a bunch of classifiers)
- Add significant digits to evaluation code: Does a loss of .3002481 mean anything?
- Verify the initial loss
- Monitor human-interpretable metrics and compare them to human performance
- Overfit on small amount of data: the network should be able to memorize it
visualize your data right before it goes to the network (i.e. from the training data generator)
3. Overfit
- Choose a model that is large enough to overfit the training data
- training loss < validation loss
- If you can't overfit something is wrong!
- Tips and tricks
- Picking a model: Don't be a hero, steal from other people.
- The adam optimizer is a good choice
- Add ONE piece of complexity at once. Don't simultaneous make every layer 2x larger and 2x the number of layers
4. Regularize
- Once we can overfit, we can sacrifice some training loss for val loss.
- Tips and tricks
- Get more data. If possible, this is the easiest way to regularize
- Augment your data: if you can't get more data, make up some fake data
- Reduce the input dimension (e.g. vocab size)
- Decrease model size
- Weight decay
- Dropout
- Early stopping
5. Tune
- Tune hyperparameters to optimize the network
- Random search is better than grid search
- There are fancy packages out there to help you
6. Squeeze the juice
- Ensemble several models
- Let it train for a long time
Reference