Daniel Adeboye - freeCodeCamp.org

How to Choose the Best GPU for Your AI Workloads

Daniel Adeboye — Thu, 15 Jan 2026 17:08:09 +0000

Choosing a GPU for your AI workload shouldn't be complicated, but it often feels that way. You're weighing specs you don't fully understand, comparing prices that seem arbitrary, and wondering if you're about to waste thousands on GPUs you don't need.

The good news: it's simpler than it looks. The right GPU matches your workload, not the spec sheet. This article breaks down what actually matters and helps you make a decision that fits your budget and needs.

Prerequisites

Before diving in, a few assumptions about where you’re starting from:

Basic familiarity with ML workflows like training, fine-tuning, and inference
Some experience with PyTorch, TensorFlow or similar frameworks
No GPU, CUDA, or hardware expertise required
Our focus is on practical decisions, not low level theory

Here's What We'll Cover:

Why GPU Choice Matters for AI
Understanding Your Workload First
Key GPU Specifications That Matter for AI
Which GPU for Which AI Workload?
Should You Rent or Buy GPUs?
Practical Decision Framework
Common Pitfalls to Avoid
Conclusion

Why GPU Choice Matters for AI

The GPU you choose directly impacts how fast you can work. Training a model that takes 12 hours on appropriate hardware might take three days on an under-specced machine. That difference compounds across dozens of training runs. Slow iteration means slower learning and delayed launches.

Memory constraints are worse. Running out of VRAM doesn't just slow you down. Your code crashes. You're forced to reduce batch sizes, which can hurt model quality. You spend days optimizing memory usage instead of improving your model.

Cost matters, too. Overspending on a data center GPU when a consumer card would work wastes money. Under-speccing and upgrading six months later wastes more. The goal is to buy hardware that matches your actual needs.

Understanding Your Workload First

Before you start comparing specs, get clear on what you're actually building. GPU requirements change dramatically depending on whether you're training models, serving predictions, or experimenting with new ideas.

Training Workloads

Training needs serious power. You're loading datasets, running passes through your network, and updating millions of parameters. A transformer model can easily need 16GB of VRAM before you think about batch size.

Training also means iteration. You run an experiment, check results, tweak something, run again. The difference between two-hour feedback versus eight hours changes how many ideas you test per day.

Inference Workloads

Inference is different. Your model is trained – now you're serving predictions to users. You care about requests per second and latency. Inference needs way less memory since you're not storing gradients or optimizer states.

Quantization helps too. An INT8 model that needed 24GB during training might serve predictions on 8GB.

Research and Experimentation

Research lives in between. One day you're training a small model. The next you're loading a pre-trained transformer for inference. You need hardware that handles changing workloads without constant optimization. Flexibility beats raw power here.

Key GPU Specifications That Matter for AI

Most GPU spec sheets throw numbers at you. TFLOPS, CUDA cores, clock speeds. Most of it doesn't matter for AI work. But there are three things that do.

VRAM Capacity

Memory is everything. Your model needs to fit. Your training batches need space. All those intermediate calculations need room. Run out and your code doesn't slow down – it crashes.

Here's what works in practice (assuming standard FP16/BF16 training without aggressive offloading). 12GB gets you started but you'll hit walls fast. 16GB is where serious work begins. You can train most common architectures without constantly wrestling with memory. 24GB is the comfort zone. Bigger models, larger batches, room to experiment.

Beyond that, you're training something massive or working with video and high-resolution images. If you're only running inference, 8GB to 12GB usually does the job.

Compute Performance

Raw speed matters, but not the way marketing wants you to think. Modern GPUs pack specialized hardware for AI. Tensor cores do the heavy lifting for deep learning math. Mixed precision training runs parts of your model in FP16 for speed and FP32 where accuracy matters. For inference, INT8 support is gold. Compress your model down and watch it fly. Four times faster with barely any quality drop.

Ignore TFLOPS comparisons between different GPU architectures. They don't tell you how fast your actual training run completes. Look for real benchmarks on frameworks you'll actually use.

Memory Bandwidth

This is the spec everyone skips. Bandwidth determines how fast your GPU shuttles data between memory and compute. Doesn't matter how powerful your cores are if they're sitting idle waiting for data.

GPUs with HBM memory move data 50 to 100 percent faster than GDDR cards. That gap shows up immediately in training speed. Memory-bound workloads, which is most AI work, get direct speedups from better bandwidth. This is why an A100 often outperforms an RTX 4090 in training despite similar raw compute. The cores on the 4090 are fast, but they spend more time waiting for data. Bandwidth keeps your GPU fed.

Which GPU for Which AI Workload?

Here's what you actually need for common AI tasks. This maps workloads to specific GPUs based on memory requirements and what works in practice.

Workload Type	VRAM Needed	Recommended GPUs
Training LLMs from scratch (small, ≤3B)	40–80GB	A100 80GB, H100 80GB
Training LLMs from scratch (mid, 7B–30B)	80–160GB	H100 80GB, H200 141GB, multi-A100
Training LLMs from scratch (large, 70B+)	160GB+ multi-GPU	B100, B200, multi-H100
Fine-tuning LLMs (7B–13B)	24–48GB	RTX 4090 (24GB), RTX A6000 (48GB), A100 40GB
Fine-tuning LLMs (30B–70B)	48–80GB (or sharded)	A100 80GB, H100, H200, 2–4× RTX 4090
Fine-tuning LLMs (70B+)	80GB+ multi-GPU	H100, H200, multi-4090 with FSDP
LLM inference (7B–13B)	8–16GB	RTX 4060 Ti 16GB, RTX 4070, L4
LLM inference (30B–70B)	24–80GB	RTX 4090, L40S, A100 80GB, H200
LLM inference (70B–180B)	80–160GB	H200, A100 80GB (multi-GPU)
LLM inference (200B+)	160GB+	B100, B200
Training vision models (ResNet, ViT)	16–24GB	RTX 4080 Super, RTX 4090
Fine-tuning vision models	12–16GB	RTX 4070 Ti Super, RTX 4080
Training diffusion models	24–48GB	RTX 4090, RTX A6000, A100
Image generation (Stable Diffusion)	8–24GB	RTX 4070, RTX 4070 Ti, RTX 4090
Production inference at scale	24–48GB	L4, L40S, A10, A100
Research / OSS experimentation	24GB sweet spot	RTX 3090, RTX 4090

Should You Rent or Buy GPUs?

Whether you buy or rent depends on the type of GPU and how you're using it.

When to Rent a GPU

Rent when your workloads are sporadic. You're experimenting, testing ideas, running training jobs here and there. Cloud GPUs make sense because you only pay when you're actually using them.

Renting is also your only option for data center GPUs like the A100, H100, and B200. You can't just buy one. NVIDIA sells them through server manufacturers as complete systems. You'd be dropping $100,000+ on a pre-configured server. Unless you're running at enterprise scale, renting H100 instances when you need that power is the move.

When to Buy a GPU

Buy when you're running things constantly. Inference servers running 24/7. Training models every week. The GPU isn't sitting idle.

Consumer cards like the RTX 4090 or professional cards like the RTX A6000 make sense to own. You can buy them from retailers, install them in your workstation, and they pay for themselves in months. An RTX 4090 costs around $2,000. Renting the equivalent at $2 per hour hits $1,440 monthly. Do the math.

Most people do both

Own hardware for daily work. Rent when you need extra power or want to test with data center GPUs. Keeps costs down while staying flexible.

Practical Decision Framework

Figure out what you actually need before looking at hardware. How big are your models? What batch sizes are you running? How often are you training?

Individual Developers

Renting cloud GPUs is the smart starting point. There’s no massive upfront cost. You pay $1 to $3 per hour for solid hardware. Train for a few hours, run experiments, shut it down. Your costs stay under control, and you're not stuck with hardware that might not fit your needs six months from now.

If you're training constantly and the monthly cloud bills are adding up, then consider buying. An RTX 4090 with 24GB costs around $1,600 to $2,000. RTX 4080 Super with 16GB runs about $1,000. They pay for themselves after several months of heavy use. But only buy when cloud costs clearly justify it.

Small Teams

Start with cloud GPUs. Rent what you need when you need it. As your workloads become predictable and sustained, then look at owning hardware. A workstation with one or two GPUs runs $3,000 to $7,000. Makes sense when you're running jobs daily, and cloud bills hit $1000+ monthly.

Most teams end up hybrid, anyway. Rent for experiments and burst workloads. Buy one solid GPU for daily work if the math works out.

Production at Scale

Cloud dominates here for good reason. Flexibility, geographic distribution, and no infrastructure headaches. Renting H100 or A100 instances gives you access to hardware you'd never buy outright.

Some companies buy servers for sustained inference when they're processing massive request volumes 24/7. You're looking at $50,000 to $100,000+ per server from Dell or Supermicro. But even then, most production deployments stay primarily cloud-based. The operational simplicity is worth it.

Common Pitfalls to Avoid

Don’t Buy Based on Gaming Benchmarks

Gaming reviews test completely different workloads. A GPU that dominates in Cyberpunk might underperform for training because of limited VRAM or memory bandwidth.

For AI work, look for benchmarks that reflect real workloads, such as:

PyTorch or TensorFlow training throughput (tokens/sec) on models like Mistral-style LLMs.
Time-to-train for a fixed number of steps or epochs
Inference latency and throughput using tools like TensorRT or vLLM

Vendor specs and gaming FPS don’t capture this. AI benchmarks show whether a GPU is compute-bound, memory-bound, or limited by VRAM.

Never Skimp on VRAM to Save Money

Running out of memory doesn't slow you down – it crashes your code. You're forced to rewrite things, reduce batch sizes, or buy new hardware anyway. When choosing between two GPUs at similar prices, take the one with more VRAM every time. That extra headroom matters more than you think.

Ignoring Power and Cooling Requirements

High-end GPUs pull 300 to 450 watts under load. Your power supply needs to handle it with headroom. Your case needs proper airflow, or that expensive GPU will thermal throttle and perform worse than cheaper hardware running cool. Check that your setup can actually support the card before buying it.

Assuming You Need the Latest Hardware

Last-generation GPUs often deliver 80% of the performance at 50% of the price. An RTX 4080 might be newer, but a used RTX 3090 with 24GB can handle many of the same workloads for way less money. Don't chase the latest release unless your work genuinely needs it.

Not Testing in the Cloud First

If you're unsure what you need, rent it first. Spend $50 testing different GPU types on cloud platforms before dropping $2,000 on hardware. You'll learn what actually bottlenecks your workloads. Maybe you need more VRAM. Maybe compute is fine, and you're memory bandwidth limited. Test before you buy.

Conclusion

Choosing the right GPU comes down to matching hardware to your actual workload. Training needs memory and bandwidth. Inference prioritizes efficiency. How often you run jobs determines whether renting or buying makes sense.

Start with the cloud. Test different GPUs, learn where your models bottleneck, and keep costs low while you experiment. When your workloads become predictable and sustained, then buying hardware starts to make sense.

Don’t overthink it. The wrong GPU slows you down. The right one disappears and lets you ship.

How to Build Slim and Fast Docker Images with Multi-Stage Builds

Daniel Adeboye — Wed, 14 May 2025 15:06:11 +0000

Apps don’t stay simple forever. More features mean more dependencies, slower builds, and heavier Docker images. That’s where things start to hurt.

Docker helps, but without the right setup, your builds can quickly get bloated.

Multi-stage builds make things smoother by keeping your images fast, clean, and production-ready. In this guide, you'll learn how to use them to supercharge your Docker workflow.

Let’s get into it.

Prerequisites

To follow this guide, you should have:

Docker installed and running
Basic understanding of Docker
Some Python knowledge (or any language, really)
Familiarity with the terminal

What are Docker Images?

Before we dive into optimization, let’s quickly get clear on what Docker images actually are.

A Docker image is a lightweight, standalone package that has everything your app needs to run – code, dependencies, environment variables, and config files. Think of it as a snapshot of your app, ready to spin up anywhere.

When you run an image, Docker turns it into a container: a self-contained environment that behaves the same on your machine, in staging, or in production. That consistency is a huge win for development and deployment.

Now that we’ve got the basics, let’s talk about making those images smaller and faster.

How to Implement Multi-Stage Builds

Let’s get hands-on by creating a basic Flask app and using a multi-stage build to keep our Docker image slim.

Step 1: Create `app.py`

from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello():
    return "Hello, Docker Multi-stage Builds! 🐳"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Step 2: Install and save dependencies

Install Flask and Gunicorn using pip:

pip install flask gunicorn

Then freeze your environment into a requirements.txt file:

pip freeze > requirements.txt

This file is what Docker will use to install dependencies inside your container.

Step 3: Create the multi-stage `Dockerfile`

# Stage 1: Build Stage
FROM python:3.9-slim AS builder

WORKDIR /app

COPY requirements.txt .

RUN python -m venv /opt/venv && \\
    . /opt/venv/bin/activate && \\
    pip install --no-cache-dir -r requirements.txt

# Stage 2: Production Stage
FROM python:3.9-slim

COPY --from=builder /opt/venv /opt/venv

WORKDIR /app

COPY . .

ENV PATH="/opt/venv/bin/:$PATH"

EXPOSE 5000

CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

In the Dockerfile above, we’ve defined both a development and a production stage for our application. The first stage, the Build Stage, uses the python:3.9-slim base image, sets up a working directory, adds all the necessary files, and creates a virtual environment. All dependencies are installed inside that virtual environment.

In the Production Stage, we again start from python:3.9-slim, but this time we copy only the virtual environment from the build stage along with the application code. Then we configure the environment to use that virtual environment and run the app using Gunicorn.

Now, in a multi-stage build, you can experiment with using different Python versions across stages – but here’s why I didn’t go that route:

Some packages may have different dependencies, depending on the Python version.
My requirements.txt file contains version-specific dependencies, so sticking to the same Python version across both stages helps avoid compatibility issues.

Once the multi-stage Dockerfile is ready, go ahead and build the images. You’ll clearly see the size difference.

Step 4: Build and run your image

To build and run your image container, use the following command:

# Build the image
docker build -t my-python-app .

# Run the container
docker run -p 5000:5000 my-python-app

If everything works correctly, your Flask app should now be live at http://localhost:5000 in your browser.

You’ll know your build succeeded when Docker completes without errors and starts the container. You should see terminal logs from Gunicorn indicating the app is up and running.

The Chunky Single-Stage Build

Let’s compare with a traditional one-stage Docker build that includes everything in one go:

FROM python:3.9-slim

WORKDIR /app

RUN apt-get update && apt-get install -y \\
    build-essential \\
    python3-dev \\
    gcc \\
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .

RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 5000

CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

The Dockerfile above uses a straightforward build process: it starts from the python:3.9-slim image, sets a working directory, installs system dependencies, creates a virtual environment, installs Python packages, copies over the app code, exposes port 5000, and runs the app using Gunicorn. This kind of Dockerfile is common and works fine, but it can lead to unnecessarily large and bloated images.

Let’s build our image to compare the size with that of the multi-stage build:

docker build -t my-chunky-app .

You’ll notice that this Dockerfile takes longer to build compared to the previous one, which was much faster.

Before we continue, confirm your Docker image was successfully built.

Now, let’s compare build sizes:

docker images | grep 'my-'

In case you're wondering why we used "my" to search for the images, it's because we named our Docker images my-python-app and my-chunky-app, so using "my" as a keyword makes it easy to filter them.

The image above compares the build sizes of our single-stage and multi-stage Docker images. As you can see, my-python-app – the multi-stage build – is small and lightweight, while my-chunky-app is significantly larger. If you dig a bit deeper, you’ll notice that the multi-stage image built in just 1.2 seconds, whereas the single-stage one took a full 1 minute and 21 seconds. Pretty impressive difference, right?

In my opinion, these are solid reasons to use a multi-stage build – but it's not always necessary. There are cases where a single-stage build makes more sense. Let’s take a look at those.

When to Use Multi-Stage Builds

Use multi-stage builds if:

Your app needs build tools (for example, compilers, dev dependencies)
You want smaller, faster Docker images
You care about image security and performance

Use single-stage builds if:

You're just testing or prototyping
Your app is tiny and doesn’t need external tools
You’re still learning the basics

Pick what fits your project’s scale and complexity.

Conclusion

Multi-stage builds are an easy win. They help keep your Docker images clean, fast, and secure – especially as your app grows.

Not every project needs them, but when you do, they make a big difference. So next time you're Dockerizing something serious, reach for multi-stage. Your future self will thank you.

Daniel Adeboye - freeCodeCamp.org

How to Choose the Best GPU for Your AI Workloads

Prerequisites

Here's What We'll Cover:

Why GPU Choice Matters for AI

Understanding Your Workload First

Training Workloads

Inference Workloads

Research and Experimentation

Key GPU Specifications That Matter for AI

VRAM Capacity

Compute Performance

Memory Bandwidth

Which GPU for Which AI Workload?

Should You Rent or Buy GPUs?

When to Rent a GPU

When to Buy a GPU

Most people do both

Practical Decision Framework

Individual Developers

Small Teams

Production at Scale

Common Pitfalls to Avoid

Don’t Buy Based on Gaming Benchmarks

Never Skimp on VRAM to Save Money

Ignoring Power and Cooling Requirements

Assuming You Need the Latest Hardware

Not Testing in the Cloud First

Conclusion

How to Build Slim and Fast Docker Images with Multi-Stage Builds

Prerequisites

Here's what we'll cover:

What are Docker Images?

How to Implement Multi-Stage Builds

Step 1: Create app.py

Step 2: Install and save dependencies

Step 3: Create the multi-stage Dockerfile

Step 4: Build and run your image

The Chunky Single-Stage Build

When to Use Multi-Stage Builds

Conclusion

Step 1: Create `app.py`

Step 3: Create the multi-stage `Dockerfile`