Sandboxing AI Agents

github
2025-07-07

For many of us "Agentic Programming" is now a thing. Tools like Cursor, Claude Code, opencode, Gemini CLI, Aider, OpenAI Codex, OpenHands, and probably a dozen more are increasingly finding their way into our programming tool chests.

The usefulness of these agent tools really shines when you give them access to do things a human programmer might need to do, namely run various commands on your computer, such as run your test harness, make changes to package versions and install said packages, perform curl requests to your internal service endpoints to help diagnose problems, run system tools to work on your code base, and the list goes on. Quite often the tools will do very reasonable things, but every once in a while you'll see it trying to do something like install a global package (sometimes using commands for the wrong Linux distribution), or otherwise do things they shouldn't be doing.

The common solution to this is to have the agent ask the user permission every time it does something. But you know what sucks? Having to stop what you're doing every 30 seconds to give the bot permission to do something. You know what else sucks? Running in YOLO mode and having your bot make some system level changes or do some other wildly inappropriate things somewhere in the middle of its 30-minute hackathon on your code base.

The Sandbox Solution

While some folks might find their project is well served by development containers, remote background workers, or other solutions, my projects are not. Most of my development work is either too simple to justify a lot of development infrastructure complexity, or too complex to fit in a nice single container that an AI agent could romp around in.

What I wanted was a way to allow an agent to go wild within a project using all the tools I have installed on my machine and all of the packages installed for a project, but protect myself when the agent does something it probably shouldn't. The solution I came up with was of course to delve into the world of Linux namespaces, OverlayFS, and the nuts and bolts about how container systems work under the hood to develop a sandbox tool for Linux. This tool allows me to create extremely lightweight copy-on-write containerized views of my computer, which in turn allows me to run any of these newfangled AI agents in a pretty carefree way. They can go off and make any changes they want, run any command they want, and my real file system is protected from their mistakes. When they are done I can inspect what they've done and pull in changes that I want to keep.

In a nutshell, the sandbox tool creates a new mount, process, and if desired network namespace, which is how docker or any other container system works, however instead of mounting a disk image, it uses OverlayFS to mount copy-on-write file systems to match the computer's normal file system. Modified files are stored on disk (by default in the ~/.sandboxes directory) so sandbox data survives reboots and in general is as durable as any other file. Setup takes a small fraction of a second and has minimal resource overhead. All this taken together means it's suitable for both ephemeral and long-term use, and you can launch quite a few of them at once if you want.

Workflow

Now with my sandbox tool I can (reasonably) safely do things like:

> sandbox claude --dangerously-skip-permissions

I can even kick off several in different terminals from the same directory and have them all run in parallel without having to worry about them stepping on each other's toes, and they won't pester me to make any progress on their task.

> sandbox --name=feature claude --dangerously-skip-permissions


                > sandbox --name=test claude --dangerously-skip-permissions "Summarize the last PR"
                | cat


                > sandbox --name=review-3023 claude --dangerously-skip-permissions "Review this
                code, if you find any bugs fix them and create a pull request" > /dev/null

When I'm satisfied with the changes the bots have made, I can inspect what the agent did and accept the changes I want to keep.

> sandbox status

Matching changes:
   ~ Cargo.toml
   ~ src/main.rs

24220 external or non-matching changes

> sandbox diff src/main.rs
--- /home/me/project/src/main.rs
+++ <sandbox>/src/main.rs
@@ -39,6 +39,7 @@
 
 pub fn main() -> Result<()> {
+    println!("Hello world");
     Ok()
 }

Looks good to me so I can accept the changes relative to my current working directory.

> sandbox accept

or if I want to be picky and only accept certain changes:

> sandbox accept src/main.rs

Discarding changes

While I haven't run into this in the wild yet, one of these days I'm going to see this:

> sandbox status /usr/bin

Matching changes:
   + /usr/bin/rootkit

24219 external or non-matching changes

and feel justified in my paranoia. When I do, I will take great satisfaction in running

> sandbox reject /usr/bin/rootkit

or maybe by that point it's time to just delete the whole sandbox

> sandbox delete

That's it

I wanted a simple safe way to explore these agents without the nags for permissions, or the fear of them doing something stupid, so I built a tool to facilitate that. If you have a similar need or desire in your life, a tool now exists that you might find useful.

P.S. That 24220 number? That's real. Granted, virtually all of them are intermediate build or cache files, but they're spread throughout my system and one of these days one of them isn't going to be benign. On that day though, at least on my system, it'll be a malicious little artifact stuck in a sandbox, so hopefully it won't really matter!

About

I'm not affiliated with any AI company, this isn't a commercial endeavour, this is just a project to fill a need I felt I had. If you think it's cool or useful, toss me a star on the sandbox github page. If you want to contact me for any reason about it open an issue or shoot me an email. Below are some directions I might go, if you have any interest in any of them pop by on the issue and share your thoughts, a thumbs up, or if you're really motivated open a pull request.

Possible Future Work

Protecting against data exfiltration

One of the biggest things tickling the back of my mind is agentic data exfiltration. At some point some agent or another is going to get an idea from some malicious website or MCP service to start sending data it shouldn't somewhere it shouldn't. Obviously I'm not alone in this thinking, the agents are already trying to protect from that to some degree, but relying on the agent software or the user to catch exfiltrations it isn't enough to give me peace of mind, especially with agents regularly writing and running their own scripts for legitimate testing and debugging needs. No, I don't trust the agents and I don't want to be in the loop to that degree checking every little script it wants to run - I want some strong protections, I want the kernel to have my back.

Currently the sandbox tool either runs without any network access, or with full network access. The former is pretty useless for agentic work, and the latter is no better than running without a sandbox in terms of data exfiltration protection. What is needed is a filtered mode where we can intercept outbound connections and filter accordingly. We want our agents to be allowed to talk to their providers, reference search engines and well known sites, but when it starts trying to POST data somewhere we've never heard of, we should be asking the user if that's really OK. I believe we can accomplish this with eBPF, MitM SSL inspection, and some sort of interfacing with the user, but that's just a sketch of an idea, it's fairly uncharted territory for me.

Monitoring

While some groundwork for using sandbox from other programs with the --json flag exists, streaming out events like file changes, process states, or network events would likely be a necessity for more advanced integrations with sandbox.

Ephemeral sandboxes

Having explicit support for creating ephemeral sandboxes that are automatically cleaned up would be nice when dealing with throw-away sandboxes, for example when automating reviews or trying to one shot PRs.

Interactive sandbox interface

The interactive diff system that Cursor has is really nice. The ability to work with the AI and quickly navigate through what changes it made, accepting or rejecting on a block by block or file by file basis is great. The CLI tools aren't built for this (nor should they be), and currently the sandbox tool is at best one file at a time. I find myself drawn to putting together either a GUI, TUI, or Neovim plugin to quickly explore the changes made in sandboxes and accept/reject them on a more granular level.

Sandboxing AI Agents github 2025-07-07