Andrej Karpathy
Andrej Karpathy

@karpathy

4 Tweets 1 reads Apr 09, 2023
ResNet-50 on ImageNet now (allegedly) down to 224sec (3.7min) arxiv.org using 2176 V100s. Increasing batch size schedule, LARS, 5 epoch LR warmup, synch BN without mov avg. (mixed) fp16 training. "2D-Torus" all-reduce on NCCL2, with NVLink2 & 2 IB EDR interconnect
Nice comparison table in the paper showing the wall clock time to 75% accuracy, over time. He et al. was CVPR 2016, so this is ~2-3 years to go 30 hours -> 3.7 minutes (~500X) 🔥
so... if this rate keeps up then around 2020 we'd be training ImageNet to 75% accuracy in 0.5 seconds :)
last fun thing to think about is that we're doing 1.28M images over 90 epochs with 68K batches, so the entire optimization is ~1700 updates to converge. How lucky for us that our Universe allows us to trade that much serial compute for parallel compute in training neural nets

Loading suggestions...