In 2018, RedLock's cloud security research team discovered that Tesla's Kubernetes dashboard was exposed to the public internet with no password on it.
An attacker had found it, deployed pods inside Tesla's cluster, and was using them to mine cryptocurrency – all on Tesla's AWS bill. The cluster had no authentication on the dashboard, no network restrictions on egress, and nothing monitoring for intrusion. Any one of those controls would have stopped the attack. None of them were in place.
This wasn't a sophisticated zero-day exploit. It was a misconfigured default.
Kubernetes ships with powerful security primitives. The problem is that almost none of them are enabled by default. A fresh cluster is deliberately permissive so it's easy to get started. That permissiveness is a feature in development. In production, it's a liability.
In this handbook, we'll work through the three most impactful security layers in Kubernetes. We'll start with Role-Based Access Control, which governs who can do what to which resources in the API. From there we'll move to pod runtime security, which locks down what containers can actually do once they're running on a node. Finally we'll deploy Falco, a syscall-level detection engine that watches for attacks in progress and alerts in real time.
By the end, you'll have a hardened cluster with working RBAC policies, enforced pod security standards, and live detection rules that fire when something suspicious happens.
Prerequisites
kubectlinstalled and configuredDocker Desktop or a Linux machine (to run kind)
Basic Kubernetes familiarity – you know what a Pod, Deployment, and Namespace are
No prior security experience needed
All demos run on a local kind cluster. Full YAML and setup scripts are in the companion GitHub repository.
Table of Contents
The Kubernetes Threat Landscape
To understand what you're defending against, you need to understand where Kubernetes exposes attack surface. There are six main areas, and most production incidents trace back to at least one of them.
The API server is the front door to your cluster. Every kubectl command, every CI deploy, and every controller reconciliation loop sends requests here. Unauthenticated or over-privileged access to the API server is effectively game over: an attacker who can talk to it can create pods, read secrets, and modify workloads freely.
etcd is the key-value store where all cluster state lives, including your Secrets. Kubernetes Secrets are base64-encoded by default, not encrypted. Anyone with direct access to etcd can read every password, token, and certificate in the cluster without going through the API server at all.
The kubelet runs on each node and manages the pods assigned to it. If its API is reachable without authentication – which is the default on older clusters – an attacker can exec into any pod on that node and read its memory without ever touching the API server.
The container runtime is the layer that actually runs your containers. A container that escapes its isolation boundary lands directly in the host OS. A privileged container with hostPID: true can read the memory of every other process on the node, including other containers.
Your supply chain (base images, third-party dependencies, Helm charts, operators) is a potential entry point at every step. The XZ Utils backdoor discovered in 2024 showed how close a well-positioned supply chain attack can come to widespread infrastructure compromise.
Finally, the network: by default, every pod in a Kubernetes cluster can reach every other pod on any port. There are no internal firewalls between workloads unless you explicitly create them with NetworkPolicy.
Real-World Breaches
These three incidents are worth understanding before you write a single line of YAML. They're not theoretical – they're documented post-mortems from real production clusters.
| Incident | Year | Root cause | What was missing |
|---|---|---|---|
| Tesla cryptomining | 2018 | Kubernetes dashboard exposed with no authentication, Unrestricted egress | RBAC on the dashboard endpoint + default-deny NetworkPolicy |
| Capital One data breach | 2019 | SSRF vulnerability in a WAF let an attacker reach the EC2 metadata API, which returned credentials for an over-privileged IAM role | Pod-level IAM restrictions (IRSA) + blocking metadata API egress |
| Shopify bug bounty (Kubernetes) | 2021 | A researcher accessed internal Kubernetes metadata through a misconfigured internal service, exposing pod environment variables containing secrets | Secret management outside environment variables + network segmentation |
The pattern across all three: not zero-day exploits, but misconfigured defaults and missing controls that should have been standard practice.
This article addresses the RBAC and pod security gaps directly.
What You'll Build
Before the first command, here is the security posture you'll have by the end of this article:
You'll start by running kube-bench to get a CIS Benchmark baseline – a concrete score showing where a default cluster stands before any hardening. From there you'll build a least-privilege RBAC policy for a CI pipeline service account and verify its permission boundaries, then audit the full cluster to confirm no over-privileged accounts exist.
On the pod security side, you'll enforce the restricted Pod Security Admission profile on your workload namespace and apply a hardened securityContext to a deployment: non-root user, read-only root filesystem, dropped capabilities, and seccomp profile. To close out, you'll deploy Falco in eBPF mode with a custom detection rule that fires when suspicious tools are run inside a container.
Start to finish, with a kind cluster already running, the demos take about 45–60 minutes.
Demo 1: Run a Cluster Security Baseline with kube-bench
Before hardening anything, it's a good idea to measure where you are. kube-bench runs the CIS Kubernetes Benchmark against your cluster and reports which checks pass and which fail. A baseline run gives you a concrete picture of your cluster's default security posture – and a reference point you can re-run after applying any hardening changes.
Step 1: Create a kind cluster
Save the following as kind-config.yaml:
# kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
kind create cluster --name k8s-security --config kind-config.yaml
Expected output:
Creating cluster "k8s-security" ...
✓ Ensuring node image (kindest/node:v1.29.0) 🖼
✓ Preparing nodes 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-k8s-security"
Step 2: Run kube-bench
kube-bench runs as a Job inside the cluster, mounting the host filesystem to inspect Kubernetes configuration files and processes:
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl wait --for=condition=complete job/kube-bench --timeout=120s
kubectl logs job/kube-bench
The output is long. Scroll to the summary at the bottom:
== Summary master ==
0 checks PASS
11 checks FAIL
9 checks WARN
0 checks INFO
== Summary node ==
17 checks PASS
2 checks FAIL
40 checks WARN
0 checks INFO
A fresh kind cluster typically fails around 14 checks. Three of the most important failures explain why defaults are a problem:
| Check ID | Description | Why it matters |
|---|---|---|
| 1.2.1 | --anonymous-auth is not set to false on the API server |
Anonymous requests can reach the API server without authentication – exactly how the Tesla dashboard was accessed |
| 1.2.6 | --kubelet-certificate-authority is not set |
The API server cannot verify kubelet identity, enabling man-in-the-middle attacks between the control plane and nodes |
| 4.2.6 | --protect-kernel-defaults is not set on the kubelet |
Kernel parameters can be modified from within a container, which is one step toward a container escape |
Note: Some kube-bench findings are expected on kind because kind is a development tool, not a production-hardened environment. The important thing is to understand what each finding means and whether it applies to your target production setup.
Delete the Job when you're done:
kubectl delete job kube-bench
Now that you have a baseline, you know what you're starting from. The next step is to work through the most impactful control on that list: access control. RBAC governs every interaction with the Kubernetes API, and getting it right is the foundation everything else builds on.
How to Configure RBAC
Role-Based Access Control is the authorisation layer in Kubernetes. Every request that reaches the API server – from kubectl, from a pod, from a controller – is checked against RBAC rules after authentication succeeds. If there is no rule that explicitly allows the action, Kubernetes denies it.
The key word is "explicitly". RBAC in Kubernetes is additive only. There is no deny rule. You grant access by creating rules, and you remove access by deleting them. This makes the mental model clean: if a subject can do something, you gave it permission to do that thing.
A Brief Case Study: The Shopify Kubernetes Misconfiguration
In 2021, security researcher Silas Cutler discovered that a Shopify internal service exposed Kubernetes metadata through an SSRF vulnerability. The metadata included pod environment variables that contained secrets. The root cause was partly RBAC: the service's service account had broader cluster access than it needed, and there was no least-privilege review process.
Shopify paid a $25,000 bug bounty and fixed the issue. The lesson is straightforward: a service account should only have the permissions it needs to do its specific job. Nothing more.
This is the principle you'll apply in Demo 2.
The Four RBAC Objects
RBAC in Kubernetes is built from four API objects. Two define permissions, two bind those permissions to subjects:
| Object | Scope | What it does |
|---|---|---|
Role |
Namespace | Defines a set of permissions within one namespace |
ClusterRole |
Cluster-wide | Defines permissions across all namespaces, or for cluster-scoped resources like Nodes |
RoleBinding |
Namespace | Grants the permissions of a Role or ClusterRole to a subject, within one namespace |
ClusterRoleBinding |
Cluster-wide | Grants the permissions of a ClusterRole to a subject across the entire cluster |
A subject is a user, a group, or a service account. Users and groups come from your authentication layer – client certificates, OIDC tokens, or cloud provider identity. Service accounts are Kubernetes-native identities created for pods.
How to Discover Resources, Verbs, and API Groups
Before you can write a Role, you need to know three things: the resource name, the API group it belongs to, and the verbs it supports. You shouldn't have to guess any of them – kubectl can tell you everything.
List all available resources and their API groups
kubectl api-resources
Partial output:
NAME SHORTNAMES APIVERSION NAMESPACED KIND
bindings v1 true Binding
configmaps cm v1 true ConfigMap
endpoints ep v1 true Endpoints
events ev v1 true Event
namespaces ns v1 false Namespace
nodes no v1 false Node
pods po v1 true Pod
secrets v1 true Secret
serviceaccounts sa v1 true ServiceAccount
services svc v1 true Service
deployments deploy apps/v1 true Deployment
replicasets rs apps/v1 true ReplicaSet
statefulsets sts apps/v1 true StatefulSet
cronjobs cj batch/v1 true CronJob
jobs batch/v1 true Job
ingresses ing networking.k8s.io/v1 true Ingress
networkpolicies netpol networking.k8s.io/v1 true NetworkPolicy
clusterroles rbac.authorization.k8s.io/v1 false ClusterRole
roles rbac.authorization.k8s.io/v1 true Role
The APIVERSION column is what you put in apiGroups. Strip the version suffix and use only the group part:
| APIVERSION in output | apiGroups value in Role |
|---|---|
v1 |
"" (empty string – the core group) |
apps/v1 |
"apps" |
batch/v1 |
"batch" |
networking.k8s.io/v1 |
"networking.k8s.io" |
rbac.authorization.k8s.io/v1 |
"rbac.authorization.k8s.io" |
The NAMESPACED column tells you whether to use a Role (namespaced resources) or a ClusterRole (non-namespaced resources like nodes).
Filter by API group
If you want to see only resources in a specific group, for example, everything in apps:
kubectl api-resources --api-group=apps
NAME SHORTNAMES APIVERSION NAMESPACED KIND
controllerrevisions apps/v1 true ControllerRevision
daemonsets ds apps/v1 true DaemonSet
deployments deploy apps/v1 true Deployment
replicasets rs apps/v1 true ReplicaSet
statefulsets sts apps/v1 true StatefulSet
List all verbs for a specific resource
Each resource supports a different set of verbs. To see exactly which verbs a resource supports, use kubectl api-resources with -o wide and look at the VERBS column:
kubectl api-resources -o wide | grep -E "^NAME|^pods "
NAME SHORTNAMES APIVERSION NAMESPACED KIND VERBS
pods po v1 true Pod create,delete,deletecollection,get,list,patch,update,watch
Or explain the resource directly:
kubectl explain pod --api-version=v1 | head -10
The full set of verbs Kubernetes supports in RBAC rules is:
| Verb | What it allows |
|---|---|
get |
Read a single named resource: kubectl get pod my-pod |
list |
Read all resources of a type: kubectl get pods |
watch |
Stream changes to resources: used by controllers and informers |
create |
Create a new resource |
update |
Replace an existing resource (kubectl apply on an existing object) |
patch |
Partially modify a resource (kubectl patch) |
delete |
Delete a single resource |
deletecollection |
Delete all resources of a type in a namespace |
exec |
Run a command inside a pod (kubectl exec) |
portforward |
Forward a port from a pod (kubectl port-forward) |
proxy |
Proxy HTTP requests to a pod |
log |
Read pod logs (kubectl logs) |
Important: get and list are separate verbs. Granting list on secrets lets a subject enumerate every secret name and value in a namespace, even if you didn't also grant get. Always think about both when working with sensitive resources like secrets, serviceaccounts, and configmaps.
Look up a resource's group with kubectl explain
If you already know the resource name but aren't sure of its group, kubectl explain tells you:
kubectl explain deployment
GROUP: apps
KIND: Deployment
VERSION: v1
...
kubectl explain ingress
GROUP: networking.k8s.io
KIND: Ingress
VERSION: v1
...
This is the fastest way to look up the apiGroups value for any resource when writing a Role.
A complete lookup workflow
Here is the practical workflow when writing a new Role from scratch:
# 1. Find the resource name and API group
kubectl api-resources | grep deployment
# Output:
# deployments deploy apps/v1 true Deployment
# 2. Find the verbs it supports
kubectl api-resources -o wide | grep deployment
# Output:
# deployments deploy apps/v1 true Deployment create,delete,...,get,list,patch,update,watch
# 3. Write the Role using the group (strip the version) and the verbs you need
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: deployment-reader
namespace: staging
rules:
- apiGroups: ["apps"] # from: apps/v1 → strip /v1
resources: ["deployments"]
verbs: ["get", "list", "watch"]
With this workflow, you never have to guess an API group or verb. You look it up, then write the minimal rule you need.
Roles and ClusterRoles
A Role defines which verbs are allowed on which resources. Here is a Role that grants read-only access to Pods and ConfigMaps inside the staging namespace:
# role-ci-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ci-reader
namespace: staging
rules:
- apiGroups: [""] # "" = the core API group (Pods, Services, Secrets, ConfigMaps)
resources: ["pods", "configmaps"]
verbs: ["get", "list", "watch"]
The apiGroups field tells Kubernetes which API group owns the resource. The core group uses an empty string "". Apps-level resources like Deployments use "apps". Custom resources use their own group, such as "networking.k8s.io".
A ClusterRole is structurally identical but omits the namespace and can reference cluster-scoped resources like Nodes and PersistentVolumes:
# clusterrole-node-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-reader # no namespace field
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
When to use which:
Use a Role when the permission is specific to one namespace. A compromised service account can only affect that namespace: the blast radius is contained. Use a ClusterRole when you need access to cluster-scoped resources, or when you want a reusable permission template that multiple namespaces can share.
A common mistake is reaching for a ClusterRole "just to be safe" because it's easier to configure. Namespace-scoped Roles are almost always the right default.
RoleBindings and ClusterRoleBindings
A Role by itself does nothing. You need a binding to attach it to a subject. Here is a RoleBinding that grants the ci-reader Role to the ci-pipeline service account:
# rolebinding-ci.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ci-reader-binding
namespace: staging
subjects:
- kind: ServiceAccount
name: ci-pipeline # the service account name
namespace: staging # the namespace the SA lives in
roleRef:
kind: Role
name: ci-reader # must match the Role name exactly
apiGroup: rbac.authorization.k8s.io
There is a useful pattern worth knowing: you can bind a ClusterRole using a RoleBinding. This creates namespace-scoped access using a reusable permission template. The ClusterRole defines the rules, while the RoleBinding constrains those rules to a single namespace.
# RoleBinding referencing a ClusterRole — scoped to one namespace only
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: view-binding
namespace: staging
subjects:
- kind: ServiceAccount
name: ci-pipeline
namespace: staging
roleRef:
kind: ClusterRole # ClusterRole, but bound to one namespace via RoleBinding
name: view # Kubernetes built-in ClusterRole: read-only access to most resources
apiGroup: rbac.authorization.k8s.io
Kubernetes ships with several useful built-in ClusterRoles: view (read-only access to most resources), edit (read/write to most resources), admin (full namespace admin), and cluster-admin (full cluster admin). Use them rather than reinventing them.
How to Use Service Accounts Safely
Every pod in Kubernetes runs as a service account. If you don't specify one, Kubernetes uses the default service account in that namespace.
The default service account starts with no permissions – but it still has a token automatically mounted into every pod at /var/run/secrets/kubernetes.io/serviceaccount/token. This means every container in your cluster can authenticate to the API server by default, even if it has nothing useful to do there.
The single most impactful change you can make is to disable this automatic token mounting on service accounts that don't need API access:
# serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app
namespace: production
automountServiceAccountToken: false # no token mounted into pods by default
You can also control it at the pod level:
spec:
automountServiceAccountToken: false # override at pod level
serviceAccountName: my-app
containers:
- name: app
image: my-app:1.0
The cluster-admin anti-pattern:
Never bind cluster-admin to a service account that runs in a pod. cluster-admin grants full read/write access to every resource in the cluster. An attacker who compromises a pod running as cluster-admin owns your cluster completely.
You will see this in Helm charts and tutorials because it "makes things work". It works because it disables the entire authorisation layer. That is not a solution – it's a ticking clock.
The Capital One breach is a direct example of this pattern at the cloud layer: an EC2 instance role had permissions far beyond what the application needed. The SSRF vulnerability was the initial foothold. The over-privileged role was what turned a minor bug into a $80 million fine.
How to Audit Your RBAC Configuration
The kubectl auth can-i command lets you check permissions for any subject. Use --as to impersonate a service account:
SA="system:serviceaccount:staging:ci-pipeline"
# These should return 'yes'
kubectl auth can-i list pods --namespace staging --as $SA
kubectl auth can-i get configmaps --namespace staging --as $SA
# These should return 'no'
kubectl auth can-i delete pods --namespace staging --as $SA
kubectl auth can-i get secrets --namespace staging --as $SA
kubectl auth can-i list pods --namespace production --as $SA
To list every permission a subject has in a namespace:
kubectl auth can-i --list \
--namespace staging \
--as system:serviceaccount:staging:ci-pipeline
For a visual matrix across the whole cluster, install rakkess (part of krew):
kubectl krew install access-matrix
# Permission matrix for all service accounts in staging
kubectl access-matrix --namespace staging
Example output:
NAME GET LIST WATCH CREATE UPDATE PATCH DELETE
ci-pipeline ✓ ✓ ✓ ✗ ✗ ✗ ✗
default ✗ ✗ ✗ ✗ ✗ ✗ ✗
monitoring ✓ ✓ ✓ ✗ ✗ ✗ ✗
If you see ✓ in the CREATE, UPDATE, PATCH, or DELETE columns for a service account that should only read, that's a finding that needs remediation.
⚠️ The wildcard danger: The most dangerous RBAC configuration is a wildcard on all three dimensions:
apiGroups: [""]
resources: [""]
verbs: ["*"]
This is functionally identical to cluster-admin. You will find it in Helm charts for controllers installed with "convenience" permissions. Always audit third-party RBAC before installing operators into a production cluster.
Demo 2 – Build a Least-Privilege RBAC Policy for a CI Pipeline
In this demo, you'll create a service account for a CI pipeline that can list pods and read configmaps in the staging namespace – and nothing else.
Step 1: Create the namespace and service account
kubectl create namespace staging
# ci-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: ci-pipeline
namespace: staging
automountServiceAccountToken: false
kubectl apply -f ci-serviceaccount.yaml
Step 2: Create the Role
# ci-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ci-reader
namespace: staging
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list"]
kubectl apply -f ci-role.yaml
Step 3: Bind the Role to the service account
# ci-rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ci-reader-binding
namespace: staging
subjects:
- kind: ServiceAccount
name: ci-pipeline
namespace: staging
roleRef:
kind: Role
name: ci-reader
apiGroup: rbac.authorization.k8s.io
kubectl apply -f ci-rolebinding.yaml
Step 4: Test allowed operations
SA="system:serviceaccount:staging:ci-pipeline"
kubectl auth can-i list pods --namespace staging --as $SA # yes
kubectl auth can-i get pods --namespace staging --as $SA # yes
kubectl auth can-i list configmaps --namespace staging --as $SA # yes
Step 5: Test denied operations
kubectl auth can-i delete pods --namespace staging --as $SA # no
kubectl auth can-i get secrets --namespace staging --as $SA # no
kubectl auth can-i list pods --namespace production --as $SA # no
kubectl auth can-i create deployments --namespace staging --as $SA # no
All four should return no. Notice the third test: even if there were a matching Role in the staging namespace, the service account cannot access production. A RoleBinding cannot cross namespace boundaries, this is by design.
Writing a least-privilege policy for a service account you control is the easy part. The harder part is auditing what already exists in a cluster. That's what Demo 3 covers.
Demo 3 – Audit RBAC with rakkess and rbac-lookup
Now you'll scan the full cluster to surface any accounts with more permissions than they need.
Step 1: Install the tools
kubectl krew install access-matrix
kubectl krew install rbac-lookup
Step 2: Run rakkess across the cluster
# All service accounts in kube-system
kubectl access-matrix --namespace kube-system
# All ServiceAccounts cluster-wide
kubectl access-matrix
Step 3: Find all cluster-admin bindings
There are two ways subjects get cluster-admin access: via a ClusterRoleBinding (cluster-wide), or via a RoleBinding that references the cluster-admin ClusterRole (namespace-scoped, still dangerous). Check both:
# Find ClusterRoleBindings that grant cluster-admin
kubectl rbac-lookup cluster-admin --kind ClusterRole --output wide
On a fresh kind cluster this returns:
No RBAC Bindings found
That is the correct and expected result. A default kind cluster doesn't create any ClusterRoleBindings to cluster-admin. The role exists, but nothing is bound to it at the cluster level by default. If you see entries here in your production cluster, each one is a finding worth investigating.
To find who has cluster-level admin access through other means, query the bindings directly:
# Find all ClusterRoleBindings and the subjects they grant
kubectl get clusterrolebindings -o wide
NAME ROLE AGE USERS GROUPS SERVICEACCOUNTS
cluster-admin ClusterRole/cluster-admin 10d system:masters
system:kube-controller-manager ClusterRole/system:kube-controller-manager 10d
system:kube-scheduler ClusterRole/system:kube-scheduler 10d
system:node ClusterRole/system:node 10d
...
The cluster-admin ClusterRoleBinding grants access to the system:masters group – the group your kubeconfig certificate belongs to. This is expected. Every other binding in this list is worth reviewing to understand what it grants and why.
What to look for: Any binding where the SERVICEACCOUNTS column is populated with an application service account (not a system: prefixed one) is a potential over-privilege finding. Application pods should never need cluster-admin.
Step 4: Verify the ci-pipeline service account
kubectl rbac-lookup ci-pipeline --kind ServiceAccount --output wide
Expected output:
SUBJECT SCOPE ROLE SOURCE
ServiceAccount/staging:ci-pipeline staging Role/ci-reader RoleBinding/ci-reader-binding
The format is /<role-name> <binding-kind>/<binding-name>. This tells you:
The service account is bound to the
ci-readerRoleThe binding is a
RoleBindingnamedci-reader-bindingThere is no namespace prefix on the role name because it is a namespaced
Role, not aClusterRole
If the output showed ClusterRole/something here, that would be a finding. It would mean the service account has cluster-wide permissions, not namespace-scoped ones.
rbac-lookup vs kubectl get: rbac-lookup gives you a subject-centric view: "what does this account have access to?" kubectl get rolebindings,clusterrolebindings -A gives you a binding-centric view: "what bindings exist in the cluster?" Use both. rbac-lookup is faster for auditing a specific service account, while the kubectl get approach is better for a full cluster inventory.
With RBAC locked down, the API server is protected. But RBAC says nothing about what a container can do once it's running. That's a separate layer entirely.
How to Harden Pod Runtime Security
RBAC controls who can talk to the Kubernetes API. Pod security controls what containers can do once they're running on a node. These are different threat vectors: RBAC protects the control plane, pod security protects the data plane.
A container that runs as root with no capability restrictions can, if compromised, write backdoors to the host filesystem, load kernel modules, read the memory of other processes if hostPID: true is set, and in some configurations escape the container entirely. Pod security closes these doors before an attacker can open them.
A Case Study: The Hildegard Malware Campaign
In early 2021, Palo Alto's Unit 42 research team documented a cryptomining malware campaign called Hildegard that specifically targeted Kubernetes clusters. The attack chain was:
Find a cluster with the kubelet API exposed without authentication
Deploy a privileged pod with
hostPID: trueUse the privileged pod to read credentials from other containers' memory
Establish persistence by writing to the host filesystem
Steps 3 and 4 would have been impossible if the pods in the cluster had been running with readOnlyRootFilesystem: true, dropped capabilities, and no hostPID. The attacker had the initial foothold. Pod security would have contained the blast radius.
Pod Security Admission
Pod Security Admission (PSA) is the built-in admission controller that enforces pod security standards at the namespace level. It replaced PodSecurityPolicy in Kubernetes 1.25.
Migrating from PSP? If you're on Kubernetes < 1.25, you may still be using PodSecurityPolicy, which was removed in 1.25. The migration path is: enable PSA in audit mode first to identify violations, fix them workload by workload, then switch to enforce. For policies PSA cannot express, add Kyverno alongside it.
PSA defines three profiles:
| Profile | Who it's for | What it restricts |
|---|---|---|
privileged |
System components (CNI plugins, monitoring agents) | Nothing – no restrictions |
baseline |
Most workloads | Blocks known privilege escalations: no hostNetwork, no hostPID, no privileged containers |
restricted |
Security-sensitive workloads | Everything in baseline, plus: must run as non-root, must drop capabilities, must set a seccomp profile |
And three enforcement modes:
| Mode | Effect | When to use |
|---|---|---|
enforce |
Rejects pods that violate the profile at admission | Production – once you've fixed violations |
audit |
Allows pods but records violations in the audit log | Migration – see what would break without breaking anything |
warn |
Allows pods but sends a warning to the client | Development – fast feedback in your terminal |
The migration path: start with audit and warn to identify violations, fix them, then switch to enforce. The two modes can run simultaneously.
Apply them as namespace labels:
# namespace-staging.yaml
apiVersion: v1
kind: Namespace
metadata:
name: staging
labels:
# Start here: audit and warn simultaneously
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/audit-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: latest
Once violations are resolved, add enforce:
kubectl label namespace staging \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=latest \
--overwrite
Note: don't use --overwrite here. Without it, if enforce is already set to a different value the command will error – which is exactly what you want. You should see:
namespace/staging labeled
If you see namespace/staging not labeled, it means enforce=restricted and enforce-version=latest were already set to those exact values. Confirm enforcement is active:
kubectl get namespace staging --show-labels
Look for pod-security.kubernetes.io/enforce=restricted in the output. If it's there, enforcement is active.
How to Configure securityContext
A securityContext defines the privilege and access control settings for a pod or container. These are the seven fields you should configure on every production workload:
| Field | Set at | What it controls |
|---|---|---|
runAsNonRoot |
Pod | Rejects containers that run as UID 0 (root) |
runAsUser / runAsGroup |
Pod | Sets a specific UID/GID – don't rely on the image default |
fsGroup |
Pod | All mounted volumes are owned by this GID |
seccompProfile |
Pod | Filters syscalls using a seccomp profile |
allowPrivilegeEscalation |
Container | Blocks setuid binaries and sudo |
readOnlyRootFilesystem |
Container | Makes the container filesystem read-only |
capabilities.drop |
Container | Removes Linux capabilities (drop ALL, add back only what is needed) |
The annotated YAML below shows all seven in context:
# secure-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
namespace: staging
spec:
replicas: 2
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
securityContext:
runAsNonRoot: true # container must run as a non-root user
runAsUser: 10001 # explicit UID — don't rely on the image's default
runAsGroup: 10001 # explicit GID
fsGroup: 10001 # volumes are owned by this group
seccompProfile:
type: RuntimeDefault # use the container runtime's default seccomp profile
automountServiceAccountToken: false
containers:
- name: app
image: nginx:1.25-alpine
securityContext:
allowPrivilegeEscalation: false # block setuid and sudo inside the container
readOnlyRootFilesystem: true # the single highest-impact setting
capabilities:
drop:
- ALL # drop every Linux capability
add: [] # add back only what is explicitly needed
volumeMounts:
- name: tmp
mountPath: /tmp
- name: nginx-cache
mountPath: /var/cache/nginx
- name: nginx-run
mountPath: /var/run
volumes:
# nginx needs writable directories — provide them as emptyDir volumes
- name: tmp
emptyDir: {}
- name: nginx-cache
emptyDir: {}
- name: nginx-run
emptyDir: {}
Why readOnlyRootFilesystem: true is the most important setting:
Most post-exploitation techniques require writing to the filesystem. Dropping a backdoor, modifying a binary, writing a cron job, or installing a keylogger all require a writable filesystem. Set readOnlyRootFilesystem: true and every one of these techniques is blocked.
The downside is that many applications write to directories like /tmp or /var/cache. The fix is to mount emptyDir volumes at those specific paths, as shown above. The rest of the filesystem stays read-only.
What each field prevents:
| Field | What it prevents |
|---|---|
runAsNonRoot: true |
Blocks containers that were built to run as root – they fail at admission |
runAsUser: 10001 |
Ensures a known, non-privileged UID even if the image doesn't set one |
allowPrivilegeEscalation: false |
Blocks setuid binaries and sudo – the most common privilege escalation path |
readOnlyRootFilesystem: true |
Prevents writing backdoors, modifying binaries, or creating persistence |
capabilities: drop: ALL |
Removes Linux capabilities like NET_RAW (raw socket access) and SYS_ADMIN (kernel operations) |
seccompProfile: RuntimeDefault |
Filters syscalls to a safe default set – blocks ~300 of the ~400 available syscalls |
OPA/Gatekeeper vs Kyverno
PSA covers the fundamentals. But you'll eventually need policies that PSA cannot express: all images must come from your private registry, all pods must have resource limits, no container may use the latest tag. For these, you need a policy engine.
Two mature options exist:
| OPA/Gatekeeper | Kyverno | |
|---|---|---|
| Policy language | Rego (a custom logic language) | YAML, same format as Kubernetes resources |
| Learning curve | Steep: Rego takes real time to learn | Gentle: if you write YAML, you can write policies |
| Mutation | Yes, via Assign/AssignMetadata |
Yes: first-class, well-documented feature |
| Audit mode | Yes: reports existing violations | Yes: policy audit mode |
| Ecosystem | Integrates with OPA in non-K8s contexts | Kubernetes-native only |
| Best for | Complex cross-resource logic and teams already using OPA | Teams who want K8s-native syntax and fast setup |
If you're starting fresh, Kyverno gets you to working policies faster. Here is a Kyverno policy that blocks images from outside your trusted registry:
# kyverno-registry-policy.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-image-registries
spec:
validationFailureAction: Enforce
background: true
rules:
- name: validate-registries
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "Images must come from registry.corp.internal/"
pattern:
spec:
containers:
- image: "registry.corp.internal/*"
How to Detect Runtime Threats with Falco
PSA and securityContext are preventive controls: they block known-bad configurations before pods start. Falco is a detective control. It watches what containers do while they're running and alerts when something looks wrong.
Falco operates at the syscall level using eBPF. It attaches to the Linux kernel and intercepts every system call made by every container on the node – file opens, network connections, process spawns, privilege escalations. It does this without modifying containers, without injecting sidecars, and with minimal overhead.
What Falco detects out of the box:
Falco's default ruleset covers the most common attack patterns. It fires when a shell is opened inside a running container, whether that's a kubectl exec session or a reverse shell from an exploit.
It watches for reads on sensitive files like /etc/shadow, /etc/kubernetes/admin.conf, and /root/.ssh/. It catches the dropper pattern: a binary written to disk and immediately executed. It detects outbound connections to known malicious IPs, writes to /proc or /sys that suggest kernel manipulation, and package managers like apt, yum, or pip being run inside containers that have no business installing software.
Each of these is a rule in Falco's default ruleset. You can extend it with custom rules for your specific workloads – which is exactly what you'll do in Demo 5. But first let's harden the Pod.
Demo 4 – Harden a Pod with securityContext
In this demo, you'll start with a default nginx deployment, observe the PSA violations it triggers, harden it step by step, and confirm it passes under the restricted profile.
Step 1: Apply PSA labels in audit mode
kubectl label namespace staging \
pod-security.kubernetes.io/audit=restricted \
pod-security.kubernetes.io/warn=restricted
Step 2: Deploy insecure nginx and observe the warnings
# insecure-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-insecure
namespace: staging
spec:
replicas: 1
selector:
matchLabels:
app: nginx-insecure
template:
metadata:
labels:
app: nginx-insecure
spec:
containers:
- name: nginx
image: nginx:1.25-alpine
kubectl apply -f insecure-nginx.yaml
Expected output (PSA warns but still creates the deployment in warn mode):
Warning: would violate PodSecurity "restricted:latest":
allowPrivilegeEscalation != false (container "nginx" must set
securityContext.allowPrivilegeEscalation=false)
unrestricted capabilities (container "nginx" must set
securityContext.capabilities.drop=["ALL"])
runAsNonRoot != true (pod or container "nginx" must set
securityContext.runAsNonRoot=true)
seccompProfile not set (pod or container "nginx" must set
securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/nginx-insecure created
Four violations. Every one of them is a real security gap. But the pod was still created "deployment.apps/nginx-insecure created"
Step 3: Deploy the hardened version
kubectl apply -f secure-deployment.yaml # the YAML from the securityContext section above
No warnings this time.
Step 4: Switch the namespace to enforce
kubectl label namespace staging \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=latest
Expected output:
namespace/staging labeled
This is the moment enforcement becomes active. Any new pod that violates the restricted profile will be rejected from this point on.
Step 5: Confirm insecure deployments are now rejected
kubectl delete deployment nginx-insecure -n staging
kubectl apply -f insecure-nginx.yaml
Expected output:
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false ...
deployment.apps/nginx-insecure created
The Deployment object is created. PSA enforces at the pod level, not the Deployment level. The Deployment and its ReplicaSet exist, but every attempt to create a pod is rejected. Check the ReplicaSet:
kubectl get replicaset -n staging -l app=nginx-insecure
NAME DESIRED CURRENT READY AGE
nginx-insecure-b668d867b 1 0 0 30s
DESIRED=1 but CURRENT=0. The ReplicaSet cannot create any pods because they're rejected at admission. Describe the ReplicaSet to see the rejection events:
kubectl describe replicaset -n staging -l app=nginx-insecure
Warning FailedCreate ReplicaSet "nginx-insecure-b668d867b" create Pod
"nginx-insecure-xxx" failed: pods is forbidden: violates PodSecurity
"restricted:latest": allowPrivilegeEscalation != false, unrestricted
capabilities, runAsNonRoot != true, seccompProfile not set
The hardened deployment continues running with its pods intact. The insecure one has zero pods and never will. This is exactly how PSA is supposed to work.
Step 6: Score the hardened pod with kube-score
kube-score is a static analysis tool that scores Kubernetes manifests against security and reliability best practices:
# macOS
brew install kube-score
# Linux: https://github.com/zegl/kube-score/releases
kube-score score secure-deployment.yaml -v
Expected output (abridged):
apps/v1/Deployment secure-app in staging
path=secure-deployment.yaml
[OK] Stable version
[OK] Label values
[CRITICAL] Container Resources
· app -> CPU limit is not set
Resource limits are recommended to avoid resource DDOS. Set resources.limits.cpu
· app -> Memory limit is not set
Resource limits are recommended to avoid resource DDOS. Set resources.limits.memory
· app -> CPU request is not set
Resource requests are recommended to make sure that the application can start and run without crashing. Set resources.requests.cpu
· app -> Memory request is not set
Resource requests are recommended to make sure that the application can start and run without crashing. Set resources.requests.memory
[CRITICAL] Container Image Pull Policy
· app -> ImagePullPolicy is not set to Always
It's recommended to always set the ImagePullPolicy to Always, to make sure that the imagePullSecrets are always correct, and to always get the image you want.
[OK] Pod Probes Identical
[CRITICAL] Container Ephemeral Storage Request and Limit
· app -> Ephemeral Storage limit is not set
Resource limits are recommended to avoid resource DDOS. Set resources.limits.ephemeral-storage
· app -> Ephemeral Storage request is not set
Resource requests are recommended to make sure the application can start and run without crashing. Set resource.requests.ephemeral-storage
[OK] Environment Variable Key Duplication
[OK] Container Security Context Privileged
[OK] Pod Topology Spread Constraints
· Pod Topology Spread Constraints
No Pod Topology Spread Constraints set, kube-scheduler defaults assumed
[OK] Container Image Tag
[CRITICAL] Pod NetworkPolicy
· The pod does not have a matching NetworkPolicy
Create a NetworkPolicy that targets this pod to control who/what can communicate with this pod. Note, this feature needs to be supported by the CNI implementation used in the Kubernetes cluster to have an effect.
[OK] Container Security Context User Group ID
[OK] Container Security Context ReadOnlyRootFilesystem
[CRITICAL] Deployment has PodDisruptionBudget
· No matching PodDisruptionBudget was found
It's recommended to define a PodDisruptionBudget to avoid unexpected downtime during Kubernetes maintenance operations, such as when draining a node.
[WARNING] Deployment has host PodAntiAffinity
· Deployment does not have a host podAntiAffinity set
It's recommended to set a podAntiAffinity that stops multiple pods from a deployment from being scheduled on the same node. This increases availability in case the node becomes unavailable.
[OK] Deployment Pod Selector labels match template metadata labels
Notice there are no security context violations: securityContext, readOnlyRootFilesystem, seccompProfile, and runAsNonRoot all pass. The remaining findings are about resource management (CPU/memory limits, ephemeral storage), availability (PodDisruptionBudget, anti-affinity), and network policy – not security context hardening. Those are important for production readiness, but they're a separate concern from the pod security hardening we did here.
You now have a pod that PSA accepts and kube-score validates. The next step is to add a detection layer – something that watches what the pod does at runtime, not just how it was configured at admission.
Demo 5 – Deploy Falco and Write a Custom Detection Rule
Now, you'll deploy Falco in eBPF mode, trigger a default alert, then extend Falco with a custom rule that catches curl and wget being run inside containers.
Step 1: Install Falco via Helm
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm install falco falcosecurity/falco \
--namespace falco \
--create-namespace \
--set driver.kind=modern_ebpf \
--set tty=true \
--wait
Confirm Falco is running on every node:
kubectl get pods -n falco
NAME READY STATUS RESTARTS AGE
falco-x8k2p 1/1 Running 0 45s
falco-m9nqr 1/1 Running 0 45s
falco-j4tpw 1/1 Running 0 45s
One pod per node. Falco runs as a DaemonSet because it needs to monitor syscalls on every node independently.
Step 2: Trigger a default alert
Open a second terminal and stream the Falco logs:
# Terminal 2 — watch for alerts
kubectl logs -n falco -l app.kubernetes.io/name=falco -f --max-log-requests 3
In your first terminal, exec into the secure-app pod:
# Terminal 1 — trigger the shell detection
POD=$(kubectl get pod -n staging -l app=secure-app \
-o jsonpath='{.items[0].metadata.name}')
kubectl exec -it $POD -n staging -- sh
Within a second, Terminal 2 shows:
2024-03-15T14:23:41.456Z: Notice A shell was spawned in a container with an attached terminal
(user=root user_loginuid=-1 k8s.ns=staging k8s.pod=secure-app-7d9f8b-xxx
container=app shell=sh parent=runc cmdline=sh terminal=34816)
rule=Terminal shell in container priority=NOTICE
tags=[container, shell, mitre_execution]
This is Falco's built-in Terminal shell in container rule firing. It detected the kubectl exec session the moment you ran it.
Step 3: Write a custom rule
The built-in rules are comprehensive, but every production environment has workloads with unique behaviour. Here is a custom rule that alerts when curl or wget is executed inside any container:
# custom-rules.yaml
customRules:
custom-rules.yaml: |-
- rule: Suspicious network tool in container
desc: >
Detects execution of curl or wget inside a running container.
These tools are commonly used for data exfiltration, downloading
attacker payloads, or reaching command-and-control servers.
Production containers should not be making ad-hoc HTTP requests.
condition: >
spawned_process
and container
and proc.name in (curl, wget)
output: >
Network tool executed in container
(user=%user.name tool=%proc.name cmd=%proc.cmdline
pod=%k8s.pod.name ns=%k8s.ns.name image=%container.image)
priority: WARNING
tags: [network, exfiltration, custom]
Apply it by upgrading the Helm release:
helm upgrade falco falcosecurity/falco \
--namespace falco \
--set driver.kind=modern_ebpf \
--set tty=true \
-f custom-rules.yaml
Good, it deployed. Now wait for pods to be ready and test your custom rule:
Step 4: Test the custom rule
# Terminal 1 — run curl inside the container
kubectl exec -it $POD -n staging -- sh -c 'curl https://example.com'
Terminal 2 immediately shows:
2024-03-15T14:31:07.812Z: Warning Network tool executed in container
(user=root tool=curl cmd=curl https://example.com
pod=secure-app-7d9f8b-xxx ns=staging image=nginx:1.25-alpine)
rule=Suspicious network tool in container priority=WARNING
tags=[network, exfiltration, custom]
Step 5: Route alerts to Slack with Falcosidekick
Streaming logs is useful during development. In production, you need alerts routed to your alerting pipeline. Falcosidekick handles this with support for Slack, PagerDuty, Datadog, Elasticsearch, and over 50 other outputs:
# falcosidekick-values.yaml
config:
slack:
webhookurl: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
minimumpriority: "warning"
messageformat: >
[{{.Priority}}] {{.Rule}} |
pod: {{.OutputFields.k8s.pod.name}} |
ns: {{.OutputFields.k8s.ns.name}} |
image: {{.OutputFields.container.image}}
helm install falcosidekick falcosecurity/falcosidekick \
--namespace falco \
-f falcosidekick-values.yaml
Tuning Falco for production: A fresh Falco deployment will generate false positives, especially in the first week. Your job is to tune rules to match your workloads' normal behaviour, not to respond to every alert.
Here's the workflow: deploy in staging → identify false positives → add except conditions to rules → validate the false positive rate is low → enable in production with alerting.
Cleanup
To remove everything created in this article:
# Delete the staging namespace and everything in it
kubectl delete namespace staging
# Delete Falco and Falcosidekick
helm uninstall falco -n falco
helm uninstall falcosidekick -n falco
kubectl delete namespace falco
# Delete the kind cluster entirely
kind delete cluster --name k8s-security
Conclusion
In this handbook, you secured a Kubernetes cluster across three layers: RBAC, pod runtime security, and runtime threat detection.
You built a least-privilege service account, enforced the restricted Pod Security Admission profile, hardened pods with securityContext, deployed Falco for syscall-level detection, and wrote a custom rule to catch suspicious tools inside containers.
Each layer maps to a real-world breach – Tesla, Capital One, Hildegard – showing how these controls would have contained the damage. Run kube-bench again to measure the improvement.
All YAML manifests, Helm values, and setup scripts from this article are available in the companion GitHub repository.