skip to content
Smadi0x86 Blog
Table of Contents

The Challenge of Container Runtime Security

In a Kubernetes environment, the container runtime is the engine that executes your containers. While containers provide a degree of isolation, they all share the same host kernel. A vulnerability in one container, or in the runtime itself, could potentially be exploited to compromise the entire node, and from there, the entire cluster. This shared kernel model presents a significant attack surface.

Effective runtime security requires a multi-layered, defense-in-depth strategy. This involves both proactive hardening to limit the potential actions a container can take, and reactive threat detection to alert on suspicious behavior that does occur. In this deep dive, we’ll explore a powerful combination of tools to achieve this: gVisor and AppArmor for hardening, and Falco for threat detection.

Layer 1: Proactive Hardening with Sandboxing and MAC

The first layer of our defense is to proactively restrict what a container can do. We want to enforce the principle of least privilege at the kernel level. We can achieve this through two powerful, complementary technologies: gVisor and AppArmor.

gVisor: The User-Space Kernel Sandbox

What is gVisor?

gVisor is an open-source application kernel, written in Go, that provides a sandboxed environment for containers. It acts as an intermediary between the container and the host kernel. When a containerized application makes a system call (syscall), gVisor intercepts it and handles it within its own user-space kernel. Only a limited, well-defined set of syscalls are ever passed through to the actual host kernel. This dramatically reduces the attack surface, as even if an attacker compromises the container, they are trapped within the gVisor sandbox and cannot directly access the host kernel.

Integrating gVisor with Kubernetes

gVisor integrates with Kubernetes via the RuntimeClass resource. A RuntimeClass allows you to define different container runtimes that can be used to run pods. Here’s how to set it up:

  1. Install runsc: The gVisor runtime, runsc, must be installed on each node in your cluster that will run sandboxed pods.

  2. Create a RuntimeClass: Define a RuntimeClass resource that points to the gVisor runtime.

    apiVersion: node.k8s.io/v1
    kind: RuntimeClass
    metadata:
    name: gvisor # The name you'll use to reference this runtime
    handler: runsc # The name of the runtime handler
  3. Run a Pod with gVisor: To run a pod with the gVisor sandbox, you simply specify the runtimeClassName in the pod’s spec.

    apiVersion: v1
    kind: Pod
    metadata:
    name: nginx-sandboxed
    spec:
    runtimeClassName: gvisor
    containers:
    - name: nginx
    image: nginx

Pros and Cons of gVisor

  • Pros: Provides a very strong isolation boundary, significantly reducing the risk of kernel exploits.
  • Cons: There is a performance overhead due to the syscall interception and handling in user space. This makes gVisor ideal for running untrusted or high-risk workloads, but it may not be suitable for high-performance, I/O-intensive applications like databases.

AppArmor: Mandatory Access Control for Linux

What is AppArmor?

AppArmor is a Linux Security Module (LSM) that provides Mandatory Access Control (MAC). Unlike the broad sandbox provided by gVisor, AppArmor allows you to define fine-grained security profiles for individual applications. These profiles, which are simple text files, specify which resources an application can access, such as file paths, network capabilities, and more. If an application attempts an action that is not permitted by its profile, the kernel will block it.

Integrating AppArmor with Kubernetes

Using AppArmor in Kubernetes is a two-step process: you must first load the profiles onto the nodes, and then you can apply them to your pods.

  1. Create and Load an AppArmor Profile: Let’s create a simple profile that denies all file write operations. Save this profile as deny-write on one of your Kubernetes nodes (e.g., in /etc/apparmor.d/deny-write).

    #include <tunables/global>
    profile k8s-deny-write flags=(attach_disconnected) {
    #include <abstractions/base>
    file,
    # Deny all file writes.
    deny /** w,
    }

    Then, load it into the kernel using the apparmor_parser utility:

    Terminal window
    sudo apparmor_parser /etc/apparmor.d/deny-write

    Note: In a real cluster, you would need to ensure this profile is loaded on every node where a protected pod might be scheduled. This can be done with a DaemonSet or configuration management tools.

  2. Apply the Profile to a Pod: Now, you can enforce this profile on a container using the securityContext field. The localhostProfile type indicates that the profile is pre-loaded on the node.

    apiVersion: v1
    kind: Pod
    metadata:
    name: hello-apparmor
    spec:
    containers:
    - name: hello
    image: busybox:1.28
    command: [ "sh", "-c", "echo 'Hello AppArmor!' && sleep 1h" ]
    securityContext:
    appArmorProfile:
    type: Localhost
    localhostProfile: k8s-deny-write
  3. Verify the Enforcement: If you deploy this pod and then try to write a file from within the container, the operation will fail.

    Terminal window
    # Attempt to write a file
    kubectl exec hello-apparmor -- touch /tmp/test
    # The command will fail with an error
    # touch: /tmp/test: Permission denied

gVisor and AppArmor Together

gVisor and AppArmor are not mutually exclusive; they solve different parts of the security problem. You can run a gVisor-sandboxed pod and still apply a more restrictive AppArmor profile to it for defense-in-depth. gVisor provides the strong isolation boundary, while AppArmor provides fine-grained control over the application’s behavior within that sandbox.

Layer 2: Reactive Threat Detection with Falco

Proactive hardening is essential, but we must also assume that a determined attacker might eventually find a way to bypass our defenses. This is where reactive threat detection comes in. We need a tool that can monitor our containers in real-time and alert us to suspicious activity as it happens.

What is Falco?

Falco is the de-facto open-source standard for cloud-native runtime security. It’s a CNCF graduated project that acts as a security camera for your applications. Falco taps into the Linux kernel and analyzes the stream of system calls to detect anomalous behavior based on a powerful and flexible rule engine.

How Falco Works

Falco works by deploying a driver into the kernel—either an eBPF probe or a traditional kernel module. This driver passively collects syscall data and sends it to a user-space daemon. This daemon enriches the event data with metadata from Kubernetes and the container runtime, and then evaluates the events against its set of security rules.

Deploying Falco in Kubernetes

Falco is typically deployed as a DaemonSet in Kubernetes. This ensures that an instance of Falco is running on every node in the cluster, providing complete visibility into all container activity. The easiest way to deploy Falco is by using the official Helm chart.

Decoding Falco Rules

The power of Falco lies in its rules. Let’s look at a classic example: detecting when a shell is spawned inside a container.

- rule: Terminal shell in container
desc: A shell was used as the entrypoint for a container with an attached terminal.
condition: >
evt.type = execve and evt.dir = < and container.id != host and proc.name = bash
output: >
A shell was spawned in a container with an attached terminal (user=%user.name
k8s.ns=%k8s.ns.name k8s.pod=%k8s.pod.name container=%container.id
image=%container.image.repository cmdline=%proc.cmdline)
priority: NOTICE
source: syscall
  • condition: This is the core logic. It triggers if an execve syscall is made (evt.type = execve) from within a container (container.id != host) to execute the bash program (proc.name = bash).
  • output: This is the format of the alert message, which is enriched with useful context like the Kubernetes namespace, pod name, and container image.
  • priority: The severity of the alert.

If an attacker were to kubectl exec into a running pod and start a shell, Falco would immediately generate an alert like this:

10:25:19.872075539: Notice A shell was spawned in a container with an attached terminal (user=root k8s.ns=default k8s.pod=nginx-756f7b9d7-abcde container=... image=nginx cmdline=bash)

This real-time visibility is invaluable for incident response.

Conclusion: Defense-in-Depth

By combining these three technologies, we create a formidable, multi-layered security posture for our Kubernetes workloads:

  1. gVisor: Provides a strong, sandboxed isolation boundary for high-risk applications, protecting the host kernel from direct attack.
  2. AppArmor: Enforces fine-grained mandatory access control on all applications, limiting their actions to only what is necessary and preventing them from accessing sensitive resources.
  3. Falco: Acts as a vigilant security camera, monitoring all system activity in real-time and instantly alerting on any suspicious behavior that might indicate a breach.