Back to Course
Computer Vision Engineering
Module 9 of 11
9. Self-Supervised Learning
1. Learning without Labels
Labeling 1M images costs $100k. The internet has billions of unlabeled images. Self-Supervised Learning (SSL) creates "Pseudo-labels" from the data itself.
2. MAE (Masked Autoencoders)
The "BERT" of Vision.
- Take an image.
- Hide 75% of the patches.
- Ask the model to reconstruct the missing pixels. This forces the model to understand "Dog" to paint the missing tail.
3. DINO (Distillation with NO labels)
A Teacher and Student network view different crops of the same image. The Student tries to output the same features as the Teacher.