Rattibha

Andrej Karpathy

4 Tweets 1 reads Apr 09, 2023

ResNet-50 on ImageNet now (allegedly) down to 224sec (3.7min) arxiv.org using 2176 V100s. Increasing batch size schedule, LARS, 5 epoch LR warmup, synch BN without mov avg. (mixed) fp16 training. "2D-Torus" all-reduce on NCCL2, with NVLink2 & 2 IB EDR interconnect

arxiv.org

Nice comparison table in the paper showing the wall clock time to 75% accuracy, over time. He et al. was CVPR 2016, so this is ~2-3 years to go 30 hours -> 3.7 minutes (~500X) 🔥

so... if this rate keeps up then around 2020 we'd be training ImageNet to 75% accuracy in 0.5 seconds :)

last fun thing to think about is that we're doing 1.28M images over 90 epochs with 68K batches, so the entire optimization is ~1700 updates to converge. How lucky for us that our Universe allows us to trade that much serial compute for parallel compute in training neural nets

Loading suggestions...

Categories

More from this author

Related Threads

Popular Threads

Categories

More from this author

Related Threads

Popular Threads

Unroll Thread