From: aidotengineer
Arachis, an open-source code execution and computer use sandboxing service for AI agents, utilizes microVMs as its core technology to provide secure and customizable sandboxes for AI workloads [00:00:00]. This approach addresses the critical need for isolated environments when running untrusted or potentially malicious AI-generated code [00:01:40].
Why AI Sandboxes Are Essential
The latest AI models, such as GPT-3, increasingly leverage tool calling (e.g., search or code execution) during inference to provide more intelligent replies [00:00:59]. These tool calls necessitate AI sandboxes for their execution [00:01:07].
Key use cases and benefits include:
- Reinforcement Learning (RL): Sandboxes are crucial during the training phase to run reward functions at scale [00:01:12].
- Enhanced Agent Capabilities: A full Linux sandbox empowers agents, allowing them to debug entire applications using Linux commands like
ps
orlsof
during code generation [00:01:21]. This enables agents to backtrack, replan, and work towards goals effectively [00:01:32]. - Security: Running AI-generated code, which can be buggy or malicious, directly on a host or production server poses significant risks, including potential root access or data breaches [00:01:40]. Sandboxes provide the necessary isolation [00:01:52].
Examples like Manas AI demonstrate how a sandbox enables complex code generation tasks without extensive prompting or alignment frameworks, leveraging the AI’s pre-training knowledge of Linux [00:02:04].
Introducing Arachis
Arachis offers a secure, fully customizable, and self-hosted solution for spawning and managing AI sandboxes for code execution and computer use [00:02:42].
Key Features
- MicroVM-based Secure Code Execution: Utilizes microVMs as a runtime to protect data and systems from potentially malicious or buggy AI-generated code [00:03:20].
- Speed: Arachis boots sandboxes in less than 7 seconds, significantly faster than traditional VMs, with ongoing efforts to reduce this to under a second [00:03:48]. Snapshots are also very fast, in single-digit seconds [00:04:08].
- Port Forwarding: Automatically handles port forwarding, allowing easy access to code execution environments or browser use via public URLs [00:04:15].
- Easy Computer Use: Chrome is pre-installed with a VNC server, enabling easy access to the browser GUI within the sandbox [00:04:31].
- Backtracking (Snapshot and Restore): Agents can checkpoint their progress by snapshotting the sandbox. If multi-step workflows fail, they can restore an older snapshot, leading to more reliable and complex task execution [00:04:47].
- Ubiquitous API: Provides Python and Golang clients, an MCP server, and an Open API compatible YAML file for generating clients in any language [00:05:10].
- Configurable with Docker Tooling: Users can customize binaries and packages installed in the sandbox using existing Docker commands and Dockerfiles [00:05:24].
High-Level Architecture
Arachis features a REST server that spawns and manages microVM sandboxes [00:05:40]. Each sandbox runs a VNC server and a code server, exposed via port forwarding [00:05:47]. The system is tied to Linux due to its reliance on /dev/kvm
, the Linux virtualization device [00:06:19].
The API exposes resources for managing VMs (start, stop, delete), snapshots (snapshot, restore), command execution, file upload/download, and health checks [00:06:36].
Understanding Linux Sandboxing
To grasp why microVMs are chosen for AI sandboxes, it’s helpful to understand different Linux sandboxing options.
Linux Execution Model
On Linux, a thread is the smallest unit of execution, represented by a task_struct
in the kernel [00:08:21]. A process is a logical construct of multiple threads that share page tables and other resources [00:08:42]. The kernel provides privileged access to hardware, requiring system calls to switch to kernel or supervisor mode [00:08:56].
Containers
Containers solve the problem of packaging an application’s dependencies with its core logic, enabling arbitrary user code to run on a machine [00:10:13]. On Linux, a container is a collection of namespaces (e.g., process, mount, network) that abstract resources, giving the container an isolated view of its environment [00:10:32]. Cgroups are used alongside namespaces to control resource access (CPU, memory) for a container [00:11:41].
Security Story of Containers
Containers run as native processes directly on the host kernel [00:12:20]. This means a kernel vulnerability can be exploited by a malicious or buggy process within the container, allowing it to gain root access and compromise the host [00:12:32].
Container Security Mitigation
To mitigate these risks, techniques like jailing containers by restricting Linux capabilities and system calls are used to reduce the attack surface [00:13:19]. seccomp
filters can also block or filter arguments to system calls [00:14:01]. However, even with these measures, sandboxing has limits and can sometimes be bypassed [00:14:31].
Transition to Virtualization
When stronger isolation is needed, virtualization provides a more robust primitive for running untrusted code [00:14:47]. Each Virtual Machine (VM) has its own guest user space and guest kernel, offering a smaller attack surface to the host kernel compared to containers [00:14:53].
Linux Virtualization Explained
In Linux virtualization, a Virtual Machine Monitor (VMM), such as QEMU, CrossVM, or Firecracker, manages the VMs [00:15:48]. The VMM interacts with /dev/kvm
, a Linux kernel device that exposes the processor’s virtualization stack, to start VMs and grant access to privileged resources [00:16:01].
When a guest VM needs to access host resources (disk, network), it triggers a “VM exit” to the host. The VMM handles this exit, interacts with the host kernel for the resource, and then sends the response back to the guest with a “VM resume” [00:16:56]. Minimizing VM exits and resumes is crucial for performance [00:17:25]. While VMs offer superior security, performance might be affected in I/O-heavy loads due to these exits, unlike native processes in containers [00:18:00].
MicroVMs vs. Traditional VMs
MicroVMs represent a refined approach to virtualization, offering enhanced security and performance compared to traditional VMs like QEMU [00:18:21].
Key Differentiators:
- Rust-based Implementation: Projects like CrossVM pioneered writing VMMs in Rust, which provides memory safety, mitigating vulnerabilities that could allow untrusted guest code to attack host devices written in less memory-safe languages like C [00:18:38].
- Jailed Emulated Devices: MicroVMs typically jail their emulated devices separately. For example, a block device is restricted to only block-related system calls, preventing a compromise in one device from affecting others [00:19:11].
- “Micro” Aspect (Speed and Memory): The “micro” in microVMs refers to the VMM process itself [00:20:26]. Unlike older VMMs that support numerous architectures and obscure devices, microVMs (e.g., Firecracker, Cloud Hypervisor) support a limited set (Intel, ARM) and only major devices [00:19:54]. This reduced codebase leads to blazing fast boot times and lower memory consumption at runtime [00:20:17].
MicroVMs like Firecracker (underpinning AWS Lambda) and Cloud Hypervisor (a more general-purpose enterprise VM) stemmed from the CrossVM revolution [00:22:09]. Arachis selected Cloud Hypervisor for its hot-plugging device support, GPU support, existing snapshot capabilities, and its community-driven project model [00:22:40]. G Visor, another option, offers a middle ground between containers and microVMs in terms of performance and security, and provides easier GPU access [00:23:09].
Arachis’s MicroVM-Powered Architecture
Arachis explicitly chooses a microVM runtime for its AI sandboxes primarily due to security requirements for multi-tenant code execution, where untrusted AI-generated code might access different clients’ data [00:21:02]. The fast boot times and ease of snapshotting by simply dumping guest memory are also critical factors [00:21:40].
File System Management
Arachis protects the sandbox’s root file system by using an overlay FS [00:24:58].
- A read-only base layer is shared among sandboxes [00:25:03].
- Each sandbox receives its own read-write layer where all new files are created [00:25:14].
- When snapshotting, only this read-write layer is persisted, optimizing storage and backup [00:25:31]. This setup is handled during the sandbox’s boot process, making it appear as a regular Linux file system to processes inside [00:25:59].
Networking Setup
Each Arachis sandbox, running as a virtual machine, has its own isolated networking setup [00:27:09].
- A unique
tap
device (virtual networking interface) is created for each sandbox [00:27:19]. - All
tap
devices connect to a Linux bridge on the host server [00:27:34]. - Arachis automatically handles port forwarding, utilizing Linux IP tables to direct traffic between the host and the sandbox’s code server or VNC server [00:27:44].
Customization and Built-in Tools
Arachis allows customization of binaries and packages within the sandbox via a Dockerfile [00:29:29]. The default setup includes Ubuntu 22.04, standard packages, Chrome (booted via systemd), NodeJS, npm, and Python, providing a rich environment for AI agents [00:29:46].
Code Execution and GUI Access
Arachis bundles a code execution server within the sandbox [00:31:16]. This server provides a files
API for uploading/downloading files and a command
API for executing commands and returning JSON output/errors [00:31:27]. Running this server inside a secure microVM means high confidence that code won’t escape to the host OS [00:31:52]. Chrome is pre-installed, and port forwarding facilitates direct GUI access via a VNC server [00:32:06].
Snapshotting in Arachis
Snapshotting is a critical feature enabling agents to backtrack and replan during complex, multi-step tasks [00:33:09]. If an agent fails deep into a workflow, it can restore to a last known good checkpoint instead of starting from scratch [00:33:01].
Arachis saves the entire running state of a sandbox, including guest memory and the read-write overlay FS layer [00:33:30]. This means all created files, spawned processes, and even open GUI windows are restored exactly as they were [00:33:50].
The snapshot process involves four steps:
- Pause the VM [00:34:44].
- Call the snapshot API to dump guest memory [00:34:50].
- Manually persist the read-write overlay FS layer [00:34:57].
- Resume the VM [00:35:08].
Future work includes migrating to
btrfs
for native incremental snapshot awareness [00:34:20].
Using the Arachis API
Arachis offers a user-friendly API, including a Python SDK [00:35:54]. Users can self-host Arachis, start sandboxes, run commands, upload/download files, create snapshots with a simple ID, and restore VMs from these checkpoints [00:36:01].
Demo: Google Docs Clone with Backtracking
A demonstration showcased Claude Desktop creating a collaborative Google Docs clone using Arachis’s MCP server [00:37:03]. Without extensive prompting, Claude piped commands into the Linux sandbox to build the application [00:37:37]. The demo highlighted:
- Real-time collaboration enabled by Arachis’s networking setup [00:38:47].
- Snapshotting the current state of the application [00:37:51].
- Adding a new feature (dark mode) and verifying its functionality [00:37:58].
- Restoring to the previous snapshot, effectively undoing the dark mode addition, demonstrating the power of backtracking [00:38:20].
Ongoing Work
Current development efforts for Arachis focus on:
- Achieving sub-1-second boot times [00:39:14].
- Enhancing snapshot and persistence support, including a move to
btrfs
for incremental snapshots [00:39:24]. - Optimizing for bin-packing many sandboxes on a single server through dynamic memory and resource management (e.g., memory ballooning or hot-plugging) [00:39:35].