If you want to get serious about training big deep learning models, you'll need some serious compute.
However, knowing what GPU to rent can be very intimidating and hard to research.
So, I decided to provide you with a buyer's guide!
Here's your guide to data center GPUs ๐งต
However, knowing what GPU to rent can be very intimidating and hard to research.
So, I decided to provide you with a buyer's guide!
Here's your guide to data center GPUs ๐งต
NVIDIA has released a lot of GPUs over the years, so I decided to focus on the most common ones available on cloud services.
You may ask yourself why there are no RTX 20-series or 30-series GPUs, and it's because they're forbidden from inclusion in data centers.
Here we go!
You may ask yourself why there are no RTX 20-series or 30-series GPUs, and it's because they're forbidden from inclusion in data centers.
Here we go!
1. K80
Released in 2015, the K80 contained a lot of VRAM for the time (24 GB) and was considered the go-to data center GPU for model training.
However, the GPU came before tensor cores and is relatively weak by today's standards, so I'd avoid these unless you're just learning.
Released in 2015, the K80 contained a lot of VRAM for the time (24 GB) and was considered the go-to data center GPU for model training.
However, the GPU came before tensor cores and is relatively weak by today's standards, so I'd avoid these unless you're just learning.
2. P4
The P4 was the earliest released GPU for inference workloads in this list.
At the time of its release (2016), its main value proposition was its low power consumption.
Nowadays, you may even find it priced higher than its upgraded version (the T4), so I'd avoid it.
The P4 was the earliest released GPU for inference workloads in this list.
At the time of its release (2016), its main value proposition was its low power consumption.
Nowadays, you may even find it priced higher than its upgraded version (the T4), so I'd avoid it.
3. T4
The T4 was released two years after the P4 and is a significant upgrade for inference workloads.
Why? Extremely low power consumption, tensor cores, and plenty (16GB) of VRAM.
They're nice and cheap, so if you have an inference workload, I'd strongly consider a T4.
The T4 was released two years after the P4 and is a significant upgrade for inference workloads.
Why? Extremely low power consumption, tensor cores, and plenty (16GB) of VRAM.
They're nice and cheap, so if you have an inference workload, I'd strongly consider a T4.
4. P100
The P100 was a big improvement for model training workloads over the K80 when released.
While it has less RAM (16GB), the P100 packs way more compute and even can see memory savings from using mixed-precision training (although no tensor cores!)
The P100 was a big improvement for model training workloads over the K80 when released.
While it has less RAM (16GB), the P100 packs way more compute and even can see memory savings from using mixed-precision training (although no tensor cores!)
5. V100
Until recently, the V100 was by far the most powerful model training GPU and a huge upgrade over the P100.
While it has the same amount of VRAM, it has many more CUDA cores and introduces Tensor Cores.
These upgrades generally make it more cost-efficient than the P100.
Until recently, the V100 was by far the most powerful model training GPU and a huge upgrade over the P100.
While it has the same amount of VRAM, it has many more CUDA cores and introduces Tensor Cores.
These upgrades generally make it more cost-efficient than the P100.
6. A100
The A100 is the newest data center GPU.
It brings upgraded tensor cores to the table and most benchmarks show 3x+ faster training compared to the V100.
It also comes with up to 80GB VRAM!
The price tag might be big, but it's usually worth it over the V100.
The A100 is the newest data center GPU.
It brings upgraded tensor cores to the table and most benchmarks show 3x+ faster training compared to the V100.
It also comes with up to 80GB VRAM!
The price tag might be big, but it's usually worth it over the V100.
NVIDIA has been blowing it away with recent advancements, so usually, the newest GPUs are the most cost-efficient.
My general recommendations:
โข A100 for model training
โข T4 for inference workloads
My general recommendations:
โข A100 for model training
โข T4 for inference workloads
I hope you learned something and feel less intimidated looking at GPU offerings!
Follow me @marktenenholtz and you'll learn a lot more about how to train big models with those fancy GPUs you've rented.
Follow me @marktenenholtz and you'll learn a lot more about how to train big models with those fancy GPUs you've rented.
Loading suggestions...