Mark Tenenholtz
Mark Tenenholtz

@marktenenholtz

12 Tweets 1 reads Oct 14, 2022
I've spent 100's of hours training computer vision models.
I've revisited my code and notes from those projects and distilled them into a repeatable process that anyone can follow.
Here are the 7 steps I take every time I train a vision model:
1. Immerse yourself in the data
The great part about image data is that it's as visual as it gets.
You should scroll through as many images as possible and try to find patterns.
The best models come from those who have spent hours on this, not minutes.
You can use Jupyter widgets and @Gradio apps to make this faster, also.
While you're scrolling, ask yourself questions like:
• Does spatial position matter?
• Are there any data issues (i.e. duplicates)
• How noisy is the target?
• Is the target ever occluded?
2. Create a human baseline
While you're scrolling through images, try to get a gauge of your own accuracy.
Kaggle is nice in the sense that you get a leaderboard as a benchmark, but if you don't have this, a human benchmark is much better than a naive one, like the target mean.
3. Set up your pipeline
I start off with an extremely minimal pipeline. Here's my checklist:
• Fixed random seed
• No augmentation
• Small pretrained model (resnet18, efficientnet-b0)
• AdamW optimizer with no scheduler
• Implement logging
• Sanity check your metric
This is also where you should set up some QA steps.
Here are some I use (mostly borrowed from @karpathy):
• Set inputs to all zeros and compare loss to normal run
• Visualize some samples right before they enter your model
• Ensure training loss is decreasing
4. Overfit on a single batch
Just as a QA step, see if you can overfit your model on a single image or a single batch of images.
This can uncover so many tricky bugs in your pipeline.
5. Add capacity
In this stage, try to improve your loss as much as possible.
Try and pull different levers to accomplish this, such as:
• Increases to model size
• Increased image size
• Bigger output head
Make sure you only try one new thing at a time!
6. Reduce overfitting
Adding capacity usually causes you to overfit. Now, it's time to figure out how to regularize your model.
I usually follow these 4 steps, in order:
1. Get more data
2. Augmentation
3. Regularization (i.e. dropout, weight decay)
4. Smaller architecture
There are some tricks you can use, but they're more situational.
Here are some of them:
• Crop out as much background as possible
• Decrease batch size
• Add a learning rate scheduler (I always do this)
These are helpful, but usually not as much as the previous ones.
6. Repeat steps 5 + 6
Your goal now should be to keep trying to overfit and then reduce that overfitting.
You can stop when you run out of time or when you're no longer seeing improvements in validation accuracy when you regularize.
TL;DR:
1. Immerse yourself in the data
2. Human baseline
3. Set up a pipeline
4. Overfit on a single batch
5. Add capacity
6. Reduce overfitting
Follow me @marktenenholtz for more high-signal ML content!
Reply to this with the tips/tricks you have!

Loading suggestions...