GCP - freeCodeCamp.org

How to Create a GPU-Optimized Machine Image with HashiCorp Packer on GCP

Rasheedat Atinuke Jamiu — Wed, 22 Apr 2026 20:30:00 +0000

Every time you spin up GPU infrastructure, you do the same thing: install CUDA drivers, DCGM, apply OS‑level GPU tuning, and fight dependency issues. Same old ritual every single time, wasting expensive cloud credits and getting frustrated before actual work begins.

In this article, you'll build a reusable GPU-optimized machine image using Packer, pre-loaded with NVIDIA drivers, CUDA Toolkit, NVIDIA Container Toolkit, DCGM, and system-level GPU tuning like persistence mode.

Prerequisites
Project Setup
Step 1: Install Packer
Step 2: Set Up Project Directory
Step 3: Install Packer's Plugins
Step 4: Define Your Source
Step 5: Writing the Build Template
Step 6: Writing the GPU Provisioning Script
Step 7:Assembling and Running the Build
Step 8: Test the Image and Verify the GPU Stack
Conclusion
References

Prerequisites

HashiCorp Packer >= 1.9
Google Compute Packer plugin (installed via packer init)
Optionally, the AWS Packer plugin can be used for EC2 builds by adding an amazon-ebs source to node.pkr.hcl
GCP project with Compute Engine API enabled (or AWS account with EC2 access)
GCP authentication (gcloud auth application-default login) or AWS credentials
Access to an NVIDIA GPU instance type (For example, A100, H100, L4 on GCP; p4d, p5, G6 on AWS)

Project Setup

Step 1: Install Packer

To get started, you'll install Packer with the steps below if you're on macOS (or you can follow the official documentation for Linux and Windows installation guides).

First, you'll install the official Packer formula from the terminal.

Install the HashiCorp tap, a repository of all Hashicorp packages.

$ brew tap hashicorp/tap

Now, install Packer with hashicorp/tap/packer.

$ brew install hashicorp/tap/packer

Step 2: Set Up Project Directory

With Packer installed, you'll create your project directory. For clean code and separation of concerns, your project directory should look like the below. Go ahead and create these files in your packer_demo folder using the command below:

mkdir -p packer_demo/script && touch packer_demo/{build.pkr.hcl,source.pkr.hcl,variable.pkr.hcl,local.pkr.hcl,plugins.pkr.hcl,values.pkrvars.hcl} packer_demo/script/base.sh

Your file directory should look like this:

packer_demo
├── build.pkr.hcl                 # Build pipeline — provisioner ordering
├── source.pkr.hcl                # GCP source definition (googlecompute)
├── variable.pkr.hcl              # Variable definitions with defaults
├── local.pkr.hcl                 # Local values
├── plugins.pkr.hcl                # Packer plugin requirements
├── values.pkrvars.hcl             # variable values (copy and customize)
├── script/
│   ├── base.sh                  # requirement script

Step 3: Install Packer's Plugins

In your plugins.pkr.hcl file,, define your plugins in the packer block. The packer {} block contains Packer settings, including specifying a required plugin version. You'll find the required_plugins block in the Packer block, which specifies all the plugins required by the template to build your image. If you're on Azure or AWS, you can check for the latest plugin here.

packer {
  required_plugins {
    googlecompute = {
      source  = "github.com/hashicorp/googlecompute"
      version = "~> 1"
    }
  }
}

Then, initialize your Packer plugin with the command below:

packer init .

Step 4: Define Your Source

With your plugin initialized, you can now define your source block. The source block configures a specific builder plugin, which is then invoked by a build block. Source blocks contain your project ID, the zone where your machine will be created, the source_image_family (think of this as your base image, such as Debian, Ubuntu, and so on), and your source_image_project_id.

In GCP, each has an image project ID, such as "ubuntu-os-cloud" for Ubuntu. You'll set the machine type to a GPU machine type because you're building your base image for a GPU machine, so the machine on which it will be created needs to be able to run your commands.

source "googlecompute" "gpu-node" {
  project_id              = var.project_id
  zone                    = var.zone
  source_image_family     = var.image_family
  source_image_project_id = var.image_project_id
  ssh_username            = var.ssh_username
  machine_type            = var.machine_type



  image_name        = var.image_name
  image_description = var.image_description

  disk_size           = var.disk_size
  on_host_maintenance = "TERMINATE"

  tags = ["gpu-node"]

}

Setting on_host_maintenance = "TERMINATE" on Google Cloud Compute Engine ensures that a VM instance stops instead of live-migrating during infrastructure maintenance. This is important when using GPUs or specialized hardware that can't migrate, preventing data corruption.

You'll define all your variables in the variable.pkr.hcl file, and set the values in the values.pkrvars.hcl. Remember to always add your values.pkrvars.hcl file to Gitignore.

variable "image_name" {
  type        = string
  description = "The name of the resulting image"
}

variable "image_description" {
  type        = string
  description = "Description of the image"
}

variable "project_id" {
  type        = string
  description = "The GCP project ID where the image will be created"
}

variable "image_family" {
  type        = string
  description = "The image family to which the resulting image belongs"
}

variable "image_project_id" {
  type        = list(string)
  description = "The project ID(s) to search for the source image"
}

variable "zone" {
  type        = string
  description = "The GCP zone where the build instance will be created"
}

variable "ssh_username" {
  type        = string
  description = "The SSH username to use for connecting to the instance"
}
variable "machine_type" {
  type        = string
  description = "The machine type to use for the build instance"
}

variable "cuda_version" {
  type        = string
  description = "CUDA toolkit version"
  default     = "13.1"
}

variable "driver_version" {
  type        = string
  description = "NVIDIA driver version"
  default     = "590.48.01"
}

variable "disk_size" {
  type        = number
  description = "Boot disk size in GB"
  default     = 50
}

values.pkrvars.hcl

image_name        = "base-gpu-image-{{timestamp}}"
image_description = "Ubuntu 24.04 LTS with gpu drivers and health checks"
project_id        = "your gcp project id"
image_family      = "ubuntu-2404-lts-amd64"
image_project_id  = ["ubuntu-os-cloud"]
zone              = "us-central1-a"
ssh_username      = "packer"
machine_type      = "g2-standard-4"
disk_size        = 50
driver_version   = "590.48.01"
cuda_version      = "13.1"

Step 5: Writing the Build Template

Create build.pkr.hcl. The build block creates a temporary instance, runs provisioners, and produces an image.

Provisioners in this template are organized as follows:

First provisioner runs system updates and upgrades.
Second provisioner reboots the instance (expect_disconnect = true).
Third provisioner waits for the instance to come back (pause_before), then runs script/base.sh. This provisioner sets max_retries to handle transient SSH timeouts and pass environment variables for DRIVER_VERSION and CUDA_VERSION.

Lastly, you have the post-processor to tell you the image ID and completion status:

build {
  sources = ["source.googlecompute.gpu-node"]

  provisioner "shell" {
    inline = [
      "set -e",
      "sudo apt update",
      "sudo apt -y dist-upgrade"
    ]
  }

  provisioner "shell" {
    expect_disconnect = true
    inline            = ["sudo reboot"]
  }

  # Base: NVIDIA drivers, CUDA, DCGM
  provisioner "shell" {
    pause_before = "60s"
    script       = "script/base.sh"
    max_retries  = 2
    environment_vars = [
      "DRIVER_VERSION=${var.driver_version}",
      "CUDA_VERSION=${var.cuda_version}"
    ]
  }

  post-processor "shell-local" {
    inline = [
      "echo '=== Image Build Complete ==='",
      "echo 'Image ID: ${build.ID}'",
      "date"
    ]
  }
}

Step 6: Writing the GPU Provisioning Script

Now we'll go through the base script, and break down some parts of it.

Section 1: Pre-Installation (Kernel Headers)

Before installing NVIDIA drivers, the system needs kernel headers and build tools. The NVIDIA driver compiles a kernel module during installation via DKMS, so if the headers for your running kernel aren't present, the build will fail silently, and the driver won't load on boot.

log "Installing kernel headers and build tools..."
sudo apt-get install -qq -y \
  "linux-headers-$(uname -r)" \
  build-essential \
  dkms \
  curl \
  wget

Section 2: Installing NVIDIA's Apt Repository

This snippet downloads and installs NVIDIA’s official keyring package based on your OS Linux distribution, which adds the trusted signing keys needed for the system to verify CUDA packages.

log "Adding NVIDIA CUDA apt repository (${DISTRO})..."
wget -q "https://developer.download.nvidia.com/compute/cuda/repos/\({DISTRO}/\){ARCH}/cuda-keyring_1.1-1_all.deb" \
  -O /tmp/cuda-keyring.deb
sudo dpkg -i /tmp/cuda-keyring.deb
rm /tmp/cuda-keyring.deb
sudo apt-get update -qq

Section 3: Pinning NVIDIA Drivers Version

Pinning the NVIDIA driver to a specific version ensures that the system always installs and keeps using exactly that driver version, even when newer drivers appear in the repository.

NVIDIA drivers are tightly coupled with CUDA toolkit versions, Kernel versions, and container runtimes like Docker or NVIDIA Container Toolkit

A mismatch, such as the system auto‑upgrading to a newer driver, can cause CUDA to stop working, break GPU acceleration, or make the machine image inconsistent across deployments.

log "Pinning driver to version ${DRIVER_VERSION}..."
sudo apt-get install -qq -y "nvidia-driver-pinning-${DRIVER_VERSION}"

Section 4: Installing the Driver

The libnvidia-compute installs only the compute‑related user‑space libraries (CUDA driver components), while the nvidia-dkms-open; installs the open‑source NVIDIA kernel module, built locally via DKMS.

Together, these two packages give you a fully functional CUDA driver environment without any GUI or graphics dependencies.

Here, we're using NVIDIA’s compute‑only driver stack using the open‑source kernel modules, as it deliberately avoids installing any display-related components, which you don't need.

This method provides an installation module based on DKMS that's better aligned with Linux distros, as it's lightweight, and compute-focused.

log "Installing NVIDIA compute-only driver (open kernel modules)..."
sudo apt-get -V install -y \
  libnvidia-compute \
  nvidia-dkms-open

Section 5: CUDA Toolkit Installation

This part of the script installs the CUDA Toolkit for the specified version and then makes sure that CUDA’s executables and libraries are available system‑wide for every user and every shell session.

It adds CUDA binaries to PATH, so commands like nvcc, cuda-gdb, and cuda-memcheck work without specifying full paths. It also adds CUDA libraries to LD_LIBRARY_PATH, so applications can find CUDA’s shared libraries at runtime.

log "Installing CUDA Toolkit ${CUDA_VERSION}..."
sudo apt-get install -qq -y "cuda-toolkit-${CUDA_VERSION}"

# Persist CUDA paths for all users and sessions
cat <<'EOF' | sudo tee /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:-}
EOF
echo "/usr/local/cuda/lib64" | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

Section 6: NVIDIA Container Toolkit

This block installs the NVIDIA Container Toolkit and configures it so that containers (Docker or containerd) can access the GPU safely and correctly. It’s a critical step for Kubernetes GPU nodes, Docker GPU workloads, and any system that needs GPU acceleration inside containers.

log "Installing NVIDIA Container Toolkit..."
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -qq
sudo apt-get install -qq -y nvidia-container-toolkit

# Configure for containerd (primary Kubernetes runtime)
sudo nvidia-ctk runtime configure --runtime=containerd

# Configure for Docker if present on this image
if systemctl list-unit-files | grep -q "^docker.service"; then
  sudo nvidia-ctk runtime configure --runtime=docker
fi

Section 7: Installing DCGM (Data Center GPU Manager)

This section covers the installation and validation of NVIDIA DCGM (Data Center GPU Manager), which is NVIDIA’s official management and telemetry framework for data center GPUs.

It offers health monitoring and diagnostics, telemetry (including temperature, clocks, power, and utilization), error reporting, and integration with Kubernetes, Prometheus, and monitoring agents. Your GPU monitoring stack relies on this.

The script extracts the installed version and checks that it meets the minimum required version for NVIDIA driver 590+. Then it enforces the version requirement. This prevents a mismatch between the GPU driver and DCGM, which would break monitoring and health checks. It also enables fabric manager for NVLink/NVswitches, if you're on a Multi‑GPU topologies like A100/H100 DGX or multi‑GPU servers.

log "Installing DCGM..."
sudo apt-get install -qq -y datacenter-gpu-manager

DCGM_VER=\((dpkg -s datacenter-gpu-manager 2>/dev/null | awk '/^Version:/{print \)2}' | sed 's/^[0-9]*://')
DCGM_MAJOR=\((echo "\){DCGM_VER}" | cut -d. -f1)
DCGM_MINOR=\((echo "\){DCGM_VER}" | cut -d. -f2)
if [[ "\({DCGM_MAJOR}" -lt 4 ]] || { [[ "\){DCGM_MAJOR}" -eq 4 ]] && [[ "${DCGM_MINOR}" -lt 3 ]]; }; then
  error "DCGM ${DCGM_VER} is below the 4.3 minimum required for driver 590+. Check your CUDA repo."
fi
log "DCGM installed: ${DCGM_VER}"

sudo systemctl enable nvidia-dcgm
sudo systemctl start  nvidia-dcgm

# Fabric Manager — only needed for NVLink/NVSwitch GPUs (A100/H100 multi-GPU nodes)
if systemctl list-unit-files | grep -q "^nvidia-fabricmanager.service"; then
  log "Enabling nvidia-fabricmanager for NVLink GPUs..."
  sudo systemctl enable nvidia-fabricmanager
  sudo systemctl start  nvidia-fabricmanager
fi

Section 8: Enabling Persistence Mode

The NVIDIA driver normally unloads itself when the GPU is idle. When a new workload starts, the driver must reload, reinitialize the GPU, and set up memory mappings. This adds a delay of a few hundred milliseconds to several seconds, depending on the GPU and system.

Enabling nvidia‑persistenced keeps the NVIDIA driver loaded in memory even when no GPU workloads are running.

log "Enabling nvidia-persistenced..."
sudo systemctl enable nvidia-persistenced
sudo systemctl start  nvidia-persistenced

Section 9: System Tuning for GPU Compute Workloads

This block applies a set of system‑level performance and stability tunings that are standard for high‑performance GPU servers, Kubernetes GPU nodes, and ML/AI workloads.

Each line targets a specific bottleneck or instability pattern that appears in real GPU production environments.

Swap and memory behavior: Disabling swap and setting vm.swappiness=0 prevents the kernel from pushing GPU‑bound processes into swap. GPU workloads are extremely sensitive to latency, and swapping can cause CUDA context resets and GPU driver timeouts.
Hugepages for large memory allocations: Setting vm.nr_hugepages=2048 allocates a pool of hugepages, which reduces TLB pressure for large contiguous memory allocations.

CUDA, NCCL, and deep‑learning frameworks frequently allocate large buffers, and hugepages reduce page‑table overhead, improving memory bandwidth and lowering latency for large tensor operations. This is especially useful on multi‑GPU servers.
CPU frequency governor: Installing cpupower and forcing the CPU governor to performance ensures the CPU stays at maximum frequency instead of scaling down.

GPU workloads often become CPU‑bound during Data preprocessing, Kernel launches, and NCCL communication. Keeping CPUs at full speed reduces jitter and improves throughput.
NUMA and topology tools: Installing numactl, libnuma-dev, and hwloc provides tools for pinning processes to NUMA nodes, understanding CPU–GPU affinity, and optimizing multi‑GPU placement.
Disabling irqbalance: Stopping and disabling irqbalance it lets the NVIDIA driver manage interrupt affinity. For GPU servers, irqbalance can incorrectly move GPU interrupts to suboptimal CPUs, causing higher latency and lower throughput.

log "Applying system tuning..."

# Disable swap (critical for Kubernetes scheduler and ML stability)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
echo "vm.swappiness=0"     | sudo tee /etc/sysctl.d/99-gpu-swappiness.conf

# Hugepages — reduces TLB pressure for large memory allocations
echo "vm.nr_hugepages=2048" | sudo tee /etc/sysctl.d/99-gpu-hugepages.conf

# CPU performance governor
sudo apt-get install -qq -y linux-tools-common "linux-tools-$(uname -r)" || true
sudo cpupower frequency-set -g performance || true

# NUMA and topology tools for GPU affinity tuning
sudo apt-get install -qq -y numactl libnuma-dev hwloc

# Disable irqbalance — let NVIDIA driver manage interrupt affinity
sudo systemctl disable irqbalance || true
sudo systemctl stop    irqbalance || true

# Apply all sysctl settings now
sudo sysctl --system

Full base.sh script here:

#!/bin/bash
set -euo pipefail

log()   { echo "[BASE] $1"; }
error() { echo "[BASE][ERROR] $1" >&2; exit 1; }

###############################################################
###############################################################
[[ -z "${DRIVER_VERSION:-}" ]] && error "DRIVER_VERSION is not set."
[[ -z "${CUDA_VERSION:-}"   ]] && error "CUDA_VERSION is not set."

log "DRIVER_VERSION : ${DRIVER_VERSION}"
log "CUDA_VERSION   : ${CUDA_VERSION}"

DISTRO=\((. /etc/os-release && echo "\){ID}${VERSION_ID}" | tr -d '.')
ARCH="x86_64"

export DEBIAN_FRONTEND=noninteractive

###############################################################
# 1. System update
###############################################################
log "Updating system packages..."
sudo apt-get update -qq
sudo apt-get upgrade -qq -y

###############################################################
# 2. Pre-installation — kernel headers
#    Source: https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/ubuntu.html
###############################################################
log "Installing kernel headers and build tools..."
sudo apt-get install -qq -y \
  "linux-headers-$(uname -r)" \
  build-essential \
  dkms \
  curl \
  wget

###############################################################
# 3. NVIDIA CUDA Network Repository
###############################################################
log "Adding NVIDIA CUDA apt repository (${DISTRO})..."
wget -q "https://developer.download.nvidia.com/compute/cuda/repos/\({DISTRO}/\){ARCH}/cuda-keyring_1.1-1_all.deb" \
  -O /tmp/cuda-keyring.deb
sudo dpkg -i /tmp/cuda-keyring.deb
rm /tmp/cuda-keyring.deb
sudo apt-get update -qq

###############################################################
# 4. Pin driver version BEFORE installation (590+ requirement)
###############################################################
log "Pinning driver to version ${DRIVER_VERSION}..."
sudo apt-get install -qq -y "nvidia-driver-pinning-${DRIVER_VERSION}"

###############################################################
# 5. Compute-only (headless) driver — Open Kernel Modules
#    Source: NVIDIA Driver Installation Guide — Compute-only System (Open Kernel Modules)
#
#    libnvidia-compute  = compute libraries only (no GL/Vulkan/display)
#    nvidia-dkms-open   = open-source kernel module built via DKMS
#
#    Open kernel modules are the NVIDIA-recommended choice for
#    Ampere, Hopper, and Blackwell data centre GPUs (A100, H100, etc.)
###############################################################
log "Installing NVIDIA compute-only driver (open kernel modules)..."
sudo apt-get -V install -y \
  libnvidia-compute \
  nvidia-dkms-open

###############################################################
# 6. CUDA Toolkit
###############################################################
log "Installing CUDA Toolkit ${CUDA_VERSION}..."
sudo apt-get install -qq -y "cuda-toolkit-${CUDA_VERSION}"

# Persist CUDA paths for all users and sessions
cat <<'EOF' | sudo tee /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:-}
EOF
echo "/usr/local/cuda/lib64" | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

###############################################################
# 7. NVIDIA Container Toolkit
#    Required for GPU workloads in Docker / containerd / Kubernetes
###############################################################
log "Installing NVIDIA Container Toolkit..."
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -qq
sudo apt-get install -qq -y nvidia-container-toolkit

# Configure for containerd (primary Kubernetes runtime)
sudo nvidia-ctk runtime configure --runtime=containerd

# Configure for Docker if present on this image
if systemctl list-unit-files | grep -q "^docker.service"; then
  sudo nvidia-ctk runtime configure --runtime=docker
fi

###############################################################
# 8. DCGM — DataCenter GPU Manager
###############################################################
log "Installing DCGM..."
sudo apt-get install -qq -y datacenter-gpu-manager
 
DCGM_VER=\((dpkg -s datacenter-gpu-manager 2>/dev/null | awk '/^Version:/{print \)2}' | sed 's/^[0-9]*://')
DCGM_MAJOR=\((echo "\){DCGM_VER}" | cut -d. -f1)
DCGM_MINOR=\((echo "\){DCGM_VER}" | cut -d. -f2)
if [[ "\({DCGM_MAJOR}" -lt 4 ]] || { [[ "\){DCGM_MAJOR}" -eq 4 ]] && [[ "${DCGM_MINOR}" -lt 3 ]]; }; then
  error "DCGM ${DCGM_VER} is below the 4.3 minimum required for driver 590+. Check your CUDA repo."
fi
log "DCGM installed: ${DCGM_VER}"

sudo systemctl enable nvidia-dcgm
sudo systemctl start  nvidia-dcgm

# Fabric Manager — only needed for NVLink/NVSwitch GPUs (A100/H100 multi-GPU nodes)
if systemctl list-unit-files | grep -q "^nvidia-fabricmanager.service"; then
  log "Enabling nvidia-fabricmanager for NVLink GPUs..."
  sudo systemctl enable nvidia-fabricmanager
  sudo systemctl start  nvidia-fabricmanager
fi

###############################################################
# 9. NVIDIA Persistence Daemon
#    Keeps the driver loaded between jobs — reduces cold-start
#    latency on the first CUDA call in each new workload
###############################################################
log "Enabling nvidia-persistenced..."
sudo systemctl enable nvidia-persistenced
sudo systemctl start  nvidia-persistenced

###############################################################
# 10. System tuning for GPU compute workloads
###############################################################
log "Applying system tuning..."

# Disable swap (critical for Kubernetes scheduler and ML stability)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
echo "vm.swappiness=0"     | sudo tee /etc/sysctl.d/99-gpu-swappiness.conf

# Hugepages — reduces TLB pressure for large memory allocations
echo "vm.nr_hugepages=2048" | sudo tee /etc/sysctl.d/99-gpu-hugepages.conf

# CPU performance governor
sudo apt-get install -qq -y linux-tools-common "linux-tools-$(uname -r)" || true
sudo cpupower frequency-set -g performance || true

# NUMA and topology tools for GPU affinity tuning
sudo apt-get install -qq -y numactl libnuma-dev hwloc

# Disable irqbalance — let NVIDIA driver manage interrupt affinity
sudo systemctl disable irqbalance || true
sudo systemctl stop    irqbalance || true

# Apply all sysctl settings now
sudo sysctl --system

###############################################################
# Done
###############################################################
log "============================================"
log "Base layer provisioning complete."
log "  OS      : ${DISTRO}"
log "  Driver  : ${DRIVER_VERSION} (open kernel modules, compute-only)"
log "  CUDA    : cuda-toolkit-${CUDA_VERSION}"
log "  DCGM    : ${DCGM_VER}"
log "============================================"

Step 7: Assembling and Running the Build

Validate the template first, then run the build. Validation catches syntax or variable errors early, so the build doesn’t start on a broken config.

packer validate -var-file=values.pkrvars.hcl .

If validation succeeds, you’ll see a short confirmation like The configuration is valid.. After that, start the build. You should expect the process to create a temporary VM, run your provisioners, and produce an image:

packer build -var-file=values.pkrvars.hcl .

The build typically takes 15–20 minutes, depending on network speed and package installs. Watch the Packer log for three key checkpoints:

Instance creation — confirms the temporary VM was provisioned.
Provisioner output — shows each script step (updates, reboot, script/base.sh) and any errors.
Image creation — indicates the build finished and an image artifact was written.

If the build fails, copy the failing provisioner’s log lines and re-run the build after fixing the script or variables. For quick troubleshooting, re-run the failing provisioner locally on a matching test VM to iterate faster.

googlecompute.gpu-node: output will be in this color.

==> googlecompute.gpu-node: Checking image does not exist...
==> googlecompute.gpu-node: Creating temporary RSA SSH key for instance...
==> googlecompute.gpu-node: no persistent disk to create
==> googlecompute.gpu-node: Using image: ubuntu-2404-noble-amd64-v20260225
==> googlecompute.gpu-node: Creating instance...
==> googlecompute.gpu-node: Loading zone: us-central1-a
==> googlecompute.gpu-node: Loading machine type: g2-standard-4
==> googlecompute.gpu-node: Requesting instance creation...
==> googlecompute.gpu-node: Waiting for creation operation to complete...
==> googlecompute.gpu-node: Instance has been created!
==> googlecompute.gpu-node: Waiting for the instance to become running...
==> googlecompute.gpu-node: IP: 34.58.58.214
==> googlecompute.gpu-node: Using SSH communicator to connect: 34.58.58.214
==> googlecompute.gpu-node: Waiting for SSH to become available...
systemd-logind.service
==> googlecompute.gpu-node:  systemctl restart unattended-upgrades.service
==> googlecompute.gpu-node:
==> googlecompute.gpu-node: No containers need to be restarted.
==> googlecompute.gpu-node:
==> googlecompute.gpu-node: User sessions running outdated binaries:
==> googlecompute.gpu-node:  packer @ session #1: sshd[1535]
==> googlecompute.gpu-node:  packer @ user manager service: systemd[1540]
==> googlecompute.gpu-node: Pausing 1m0s before the next provisioner...
==> googlecompute.gpu-node: Provisioning with shell script: script/base.sh
==> googlecompute.gpu-node: [BASE] DRIVER_VERSION : 590.48.01
==> googlecompute.gpu-node: [BASE] CUDA_VERSION   : 13.1
==> googlecompute.gpu-node: [BASE] Updating system packages...
==> googlecompute.gpu-node: [BASE] Installing kernel headers and build tools...
==> googlecompute.gpu-node: [BASE] Installing CUDA Toolkit 13.1...
==> googlecompute.gpu-node: [BASE] Installing DCGM...
==> googlecompute.gpu-node: [BASE] Enabling nvidia-persistenced...
==> googlecompute.gpu-node: [BASE] Applying system tuning...
==> googlecompute.gpu-node: vm.swappiness=0
==> googlecompute.gpu-node: vm.nr_hugepages=2048
==> googlecompute.gpu-node: Setting cpu: 0
==> googlecompute.gpu-node: Error setting new values. Common errors:
==> googlecompute.gpu-node: [BASE] ============================================
==> googlecompute.gpu-node: [BASE] Base layer provisioning complete.
==> googlecompute.gpu-node: [BASE]   OS      : ubuntu2404
==> googlecompute.gpu-node: [BASE]   Driver  : 590.48.01 (open kernel modules, compute-only)
==> googlecompute.gpu-node: [BASE]   CUDA    : cuda-toolkit-13.1
==> googlecompute.gpu-node: [BASE]   DCGM    : 1:3.3.9
==> googlecompute.gpu-node: [BASE] ============================================
==> googlecompute.gpu-node: Deleting instance...
==> googlecompute.gpu-node: Instance has been deleted!
==> googlecompute.gpu-node: Creating image...
==> googlecompute.gpu-node: Deleting disk...
==> googlecompute.gpu-node: Disk has been deleted!
==> googlecompute.gpu-node: Running post-processor:  (type shell-local)
==> googlecompute.gpu-node (shell-local): Running local shell script: 
==> googlecompute.gpu-node (shell-local): === Image Build Complete ===
==> googlecompute.gpu-node (shell-local): Image ID: packer-69b6c2ee-883a-3602-7bb5-059f1ba27c8b
==> googlecompute.gpu-node (shell-local): Sun Mar 15 15:50:09 WAT 2026
Build 'googlecompute.gpu-node' finished after 17 minutes 55 seconds.

==> Wait completed after 17 minutes 55 seconds

==> Builds finished. The artifacts of successful builds are:
--> googlecompute.gpu-node: A disk image was created in the 'my_project-00000' project: base-gpu-image-1773585134

Step 8: Test the Image and Verify the GPU Stack

Confirm the image exists in the GCP Console: Compute → Storage → Images and locate your newly created OS image.

Create a test VM from the image:

gcloud compute instances create my-gpu-vm \
  --machine-type=g2-standard-4 \
  --accelerator=count=1,type=nvidia-l4 \
  --image=base-gpu-image-1772718104 \
  --image-project=YOUR_PROJECT_ID \
  --boot-disk-size=50GB \
  --maintenance-policy=TERMINATE \
  --restart-on-failure \
  --zone=us-central1-a

Created [https://www.googleapis.com/compute/v1/projects/my-project-000/zones/us-central1-a/instances/my-gpu-vm].
NAME       ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP    EXTERNAL_IP      STATUS
my-gpu-vm  us-central1-a  g2-standard-4               10.128.15.227  104.154.184.217  RUNNING

Once the instance is RUNNING, verify the NVIDIA driver and GPU are visible:

The nvidia-smi output confirms:

Driver 590.48.01 loaded
CUDA 13.1 available
Persistence Mode is On
The L4 GPU is detected with 23GB VRAM
Zero ECC errors
No running processes (clean idle state).

This is exactly what a healthy base image should look like. Notice Disp.A: Off? That confirms our compute-only driver choice is working — no display adapter is active.

Confirm the installed CUDA toolkit by running. nvcc --version. You can see that version 13.1 was installed as specified.

Let's confirm DCGM installation by running dcgmi discovery -l. Successful output indicates DCGM is running and communicating with the driver.

Conclusion

You now have a production‑grade, GPU‑optimized base image that includes the NVIDIA compute‑only driver built with open kernel modules, DCGM for monitoring, and the CUDA Toolkit. You also applied OS‑level tuning tailored to GPU compute workloads, providing a consistent, reproducible environment with no manual setup.

From here, you can extend the build by adding an application‑layer script to install frameworks such as PyTorch, TensorFlow, or vLLM, or create an instance template that uses this image to scale your GPU infrastructure.

The full Packer project includes additional scripts for training and inference workloads that you can use to extend your image.

References

NVIDIA Driver Installation Guide (Ubuntu): https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/
NVIDIA CUDA Toolkit Documentation: https://docs.nvidia.com/cuda/
NVIDIA Container Toolkit Installation Guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
NVIDIA DCGM Documentation: https://docs.nvidia.com/datacenter/dcgm/latest/index.html
NVIDIA Persistence Daemon: https://docs.nvidia.com/deploy/driver-persistence/index.html
HashiCorp Packer Documentation: https://developer.hashicorp.com/packer/docs
Packer Google Compute Builder: https://developer.hashicorp.com/packer/integrations/hashicorp/googlecompute

How to Automate Compliance and Fraud Detection in Finance with MLOps

Balajee Asish Brahmandam — Mon, 12 May 2025 16:21:29 +0000

These days, businesses are under increasing pressure to comply with stringent regulations while also combating fraudulent activities. The high volume of data and the intricate requirements of real-time fraud detection and compliance reporting are frequently a challenge for traditional systems to manage.

This is where MLOps (Machine Learning Operations) comes into play. It can help teams streamline these processes and elevate automation to the forefront of financial security and regulatory adherence.

In this article, we will investigate the potential of MLOps for automating compliance and fraud detection in the finance sector.

I’ll show you step by step how financial institutions can deploy a machine learning model for fraud detection and integrate it into their operations to ensure continuous monitoring and automated alerts for compliance. I’ll also demonstrate how to deploy this solution in a cloud-based environment using Google Colab, ensuring that it is both user-friendly and accessible, whether you are a beginner or more advanced.

Here’s what we’ll cover:

What is MLOps?
What You’ll Need
Step 1: Set Up Google Colab and Prepare the Data
Step 2: Data Preprocessing
Step 4: Retrain the Model with New Data
Step 5: Automated Alert System
Step 6: Visualize Model Performance
Conclusion
Key Takeaways

What is MLOps?

Machine Learning Operations, or MLOps for short, is a methodology that integrates DevOps with Machine Learning (ML). The whole machine learning model lifecycle, including development, training, deployment, monitoring, and maintenance, can be automated with its help.

MLOps has several main goals: continuous optimization, scalability, and the delivery of operational value over time.

The financial industry provides great use cases for MLOps processes and techniques, as these can help businesses manage complicated data pipelines, deploy models in real-time, and evaluate their performance – all while making sure they're compliant with regulations.

Why is MLOps Important in Finance?

Financial institutions are subject to various rules including Anti-Money Laundering (AML), Know Your Customer (KYC), and Fraud Prevention Regulations – so they have to carefully manage private information. Ignoring these rules might result in severe fines and loss of reputation.

Detecting fraud in financial transactions also calls for advanced systems capable of real-time identification of suspicious activity.

MLOps can help to solve these issues in the following ways:

MLOps lets financial institutions automatically track transactions for regulatory compliance, guaranteeing they follow changing legislation.
MLOps helps to create and implement machine learning models that can identify fraudulent transactions in real-time.
MLOps runs automated processes, enabling organizations to expand their fraud detection systems with as little human involvement as possible through automation.

What You’ll Need:

To follow along with this tutorial, ensure that you have the following:

Python installed, along with basic ML libraries such as scikit-learn, Pandas, and NumPy.
A sample dataset of financial transactions, which we will use to train a fraud detection model (You can use this sample dataset if you don’t have one on hand).
Google Colab (for cloud-based execution), which is free to use and doesn't require installation.

Step 1: Set Up Google Colab and Prepare the Data

Google Colab is an ideal choice for beginners and advanced users alike, because it’s cloud-based and doesn’t require installation. To start get started using it, follow these steps:

Access Google Colab:

Visit Google Colab and sign-in with your Google account.

Create a New Notebook:

In the Colab interface, go to File and then select New Notebook to create a fresh notebook.

Import Libraries and Load the Dataset

Now, let’s import the necessary libraries and load our fraud detection dataset. We'll assume the dataset is available as a CSV file, and we'll upload it to Colab.

Import libraries:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt

Upload the Dataset:

from google.colab import files
uploaded = files.upload()

# Load dataset into pandas DataFrame
data = pd.read_csv('data.csv')
print(data.head())

Step 2: Data Preprocessing

Data preprocessing is essential to prepare the dataset for model training. This involves handling missing values, encoding categorical variables, and normalizing numerical features.

Why is Preprocessing Important?

Data preprocessing lets you take care of various data issues that could affect your results. During this process, you’ll:

Handle missing values: Financial datasets often have missing values. Filling in these missing values (for example, with the median) ensures that the model doesn’t encounter errors during training.
Convert categorical data: Machine learning algorithms require numerical input, so categorical features (like transaction type or location) need to be converted into numeric format using one-hot encoding.
Normalize data: Some machine learning models, like Random Forest, are not sensitive to feature scaling, but normalization helps maintain consistency and allows us to compare the importance of different features. This step is especially critical for models that rely on gradient descent.

Here’s an example:

# Handle missing data by filling with the median value for each column
data.fillna(data.median(), inplace=True)

# Convert categorical columns to numeric using one-hot encoding
data = pd.get_dummies(data, drop_first=True)

# Normalize numerical columns for scaling
data['normalized_amount'] = (data['Amount'] - data['Amount'].mean()) / data['Amount'].std()

# Separate features and target variable
X = data.drop(columns=['Class'])
y = data['Class']

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Data preprocessing completed.")

Step 3: Train a Fraud Detection Model

We'll now train a RandomForestClassifier and evaluate its performance.

What is a Random Forest Classifier?

A Random Forest is an ensemble learning method that creates a collection (forest) of decision trees, typically trained with different parts of the data. It aggregates their predictions to improve accuracy and reduce overfitting.

This method is a popular choice for fraud detection because it can handle high-dimensional data. It’s also quite robust against overfitting.

Here’s how you can implement the Random Forest Classifier:

# Initialize the Random Forest Classifier
rf_model = RandomForestClassifier(n_estimators=150, random_state=42)

# Train the model on the training data
rf_model.fit(X_train, y_train)

# Predict on the test data
y_pred = rf_model.predict(X_test)

# Evaluate model performance
print("Model Evaluation:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Plot confusion matrix for visual understanding
cm = confusion_matrix(y_test, y_pred)
fig, ax = plt.subplots()
cax = ax.matshow(cm, cmap='Blues')
fig.colorbar(cax)
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

How the model is evaluated:

Classification report: Shows metrics like precision, recall, and F1-score for the fraud and non-fraud classes.
Confusion matrix: Helps visualize the performance of the model by showing the true positives, false positives, true negatives, and false negatives.

Step 4: Retrain the Model with New Data

Once you have trained your model, it’s important to retrain it periodically with new data to ensure that it continues to detect emerging fraud patterns.

What is Retraining?

Retraining the model ensures that it adapts to new, unseen data and improves over time. In the case of fraud detection, retraining is crucial because fraud tactics evolve over time, and your model needs to stay up-to-date to recognize new patterns.

Here’s how you can do this:

# Simulate loading new fraud data
new_data = pd.read_csv('new_fraud_data.csv')

# Apply preprocessing steps to new data (like filling missing values, encoding, normalization)
new_data.fillna(new_data.median(), inplace=True)
new_data = pd.get_dummies(new_data, drop_first=True)
new_data['normalized_amount'] = (new_data['transaction_amount'] - new_data['transaction_amount'].mean()) / new_data['transaction_amount'].std()

# Concatenate old and new data for retraining
X_new = new_data.drop(columns=['fraud_label'])
y_new = new_data['fraud_label']

# Retrain the model with the updated dataset
X_combined = pd.concat([X_train, X_new], axis=0)
y_combined = pd.concat([y_train, y_new], axis=0)

rf_model.fit(X_combined, y_combined)

# Re-evaluate the model
y_pred_new = rf_model.predict(X_test)
print("Updated Model Evaluation:\n", classification_report(y_test, y_pred_new))

Step 5: Automated Alert System

To automate fraud detection, we’ll send an email whenever a suspicious transaction is detected.

How the Alert System Works

The email alert system uses SMTP to send an email whenever fraud is detected. When the model identifies a suspicious transaction, it triggers an automated alert to notify the compliance team for further investigation.

import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

# Function to send an email alert
def send_alert(email_subject, email_body):
    sender_email = "your_email@example.com"
    receiver_email = "compliance_team@example.com"
    password = "your_password"

    msg = MIMEMultipart()
    msg['From'] = sender_email
    msg['To'] = receiver_email
    msg['Subject'] = email_subject

    msg.attach(MIMEText(email_body, 'plain'))

    # Send email using SMTP
    try:
        server = smtplib.SMTP_SSL('smtp.example.com', 465)
        server.login(sender_email, password)
        text = msg.as_string()
        server.sendmail(sender_email, receiver_email, text)
        server.quit()
        print("Fraud alert email sent successfully.")
    except Exception as e:
        print(f"Failed to send email: {str(e)}")

# Example: Check for fraud and trigger an alert
suspicious_transaction_details = "Transaction ID: 12345, Amount: $5000, Suspicious Activity Detected."
send_alert("Fraud Detection Alert", f"A suspicious transaction has been detected: {suspicious_transaction_details}")

Step 6: Visualize Model Performance

Finally, we will visualize the performance of the model using an ROC curve (Receiver Operating Characteristic Curve), which helps evaluate the trade-off between the true positive rate and false positive rate.

Visualizing the performance of a machine learning model is an essential step in understanding how well the model is doing, especially when it comes to evaluating its ability to detect fraudulent transactions.

What is an ROC curve?

An ROC curve shows how well a model performs across all classification thresholds. It plots the True Positive Rate (TPR) versus the False Positive Rate (FPR). The area under the ROC curve (AUC) provides a summary measure of model performance.

from sklearn.metrics import roc_curve, auc

# Calculate ROC curve
fpr, tpr, thresholds = roc_curve(y_test, rf_model.predict_proba(X_test)[:,1])
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8,6))
plt.plot(fpr, tpr, color='blue', label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='gray', linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

The ROC curve gives us a comprehensive picture of how well our model is distinguishing between the two classes across various thresholds. By evaluating this curve, we can make decisions on how to tune the model’s threshold to find the best balance between detecting fraud and minimizing false alarms (that is, minimizing false positives).

Conclusion

By following this guide, you’ve learned how to leverage MLOps to automate fraud detection and ensure compliance in the financial industry using Google Colab. This cloud-based environment makes it easy to work with machine learning models without the hassle of local setups or configurations.

From automating data preprocessing to deploying models in production, MLOps offers an end-to-end solution that improves efficiency, scalability, and accuracy in detecting fraudulent activities.

By integrating real-time monitoring and continuous updates, financial institutions can stay ahead of fraud threats while ensuring regulatory compliance with minimal manual effort.

Key Takeaways

MLOps automates the whole machine learning model lifecycle by integrating machine learning with DevOps.
Simplifies regulatory compliance and fraud detection, letting banks spot fraudulent transactions automatically.
Maintains fraud detection systems current with fresh data through constant monitoring and model retraining.
Machine learning model development and testing may be done on Google Colab, a free cloud-based platform that provides access to GPUs and TPUs. No local installation is required.
Allows for automated workflows to detect suspicious behavior and send out alerts in real-time, allowing for fraud detection and alerting.
Continuous integration/continuous delivery pipelines guarantee continuous system improvement by automating the testing and deployment of new fraud detection models.
Financial organizations may save money using MLOps because cloud-based systems like Google Colab lower infrastructure expenses.

GCP - freeCodeCamp.org

How to Create a GPU-Optimized Machine Image with HashiCorp Packer on GCP

Table of Contents

Prerequisites

Project Setup

Step 1: Install Packer

Step 2: Set Up Project Directory

Step 3: Install Packer's Plugins

Step 4: Define Your Source

Step 5: Writing the Build Template

Step 6: Writing the GPU Provisioning Script

Section 1: Pre-Installation (Kernel Headers)

Section 2: Installing NVIDIA's Apt Repository

Section 3: Pinning NVIDIA Drivers Version

Section 4: Installing the Driver

Section 5: CUDA Toolkit Installation

Section 6: NVIDIA Container Toolkit

Section 7: Installing DCGM (Data Center GPU Manager)

Section 8: Enabling Persistence Mode

Section 9: System Tuning for GPU Compute Workloads

Step 7: Assembling and Running the Build

Step 8: Test the Image and Verify the GPU Stack

Conclusion

References

How to Automate Compliance and Fraud Detection in Finance with MLOps

Here’s what we’ll cover:

What is MLOps?

Why is MLOps Important in Finance?

What You’ll Need:

Step 1: Set Up Google Colab and Prepare the Data

Access Google Colab:

Create a New Notebook:

Import Libraries and Load the Dataset

Step 2: Data Preprocessing

Why is Preprocessing Important?

Step 3: Train a Fraud Detection Model

What is a Random Forest Classifier?

Step 4: Retrain the Model with New Data

What is Retraining?

Step 5: Automated Alert System

How the Alert System Works

Step 6: Visualize Model Performance

What is an ROC curve?

Conclusion

Key Takeaways