From: aidotengineer
Arachis is an open-source code execution and computer use sandboxing service for AI agents [00:00:04]. Developed by solo founder Abhishek, it is built at the intersection of operating systems, sandboxes, and AI agents [00:00:48]. The project aims to be the next big unlock in intelligence by providing a secure and flexible environment for AI agents [00:00:12].
Why AI Sandboxes are Needed
AI sandboxes are crucial for several reasons:
- Tool Calling The latest models, such as GPT-3, leverage tool calling (like search or code execution) during inference to generate smarter replies to user queries [00:00:59]. These tool calls necessitate AI sandboxes for execution [00:01:07].
- Reinforcement Learning During the training phase of reinforcement learning, sandboxes are needed to run reward functions at scale [00:01:12].
- Enhanced Agent Capabilities A full Linux sandbox significantly extends the capabilities of AI agents [00:01:21]. For example, during code generation, agents can debug entire applications using Linux commands like
ps
andlsof
[00:01:25]. This allows them to backtrack, replan, and work towards a goal more effectively [00:01:32]. - Security Agent-generated code, like any code from GitHub or Stack Overflow, can be buggy or malicious [00:01:40]. Running such code on a host or production server without isolation risks root access, data exposure, and potential system compromise [00:01:46]. Sandboxes provide the necessary lockdown [00:01:53].
Features of Arachis
Arachis provides a secure, fully customizable, and self-hosted solution for spawning and managing AI sandboxes [00:02:42]. Key features include:
MicroVM-based Secure Code Execution
Arachis utilizes microVMs as its runtime environment to ensure security [00:03:22]. This prevents malicious or buggy code generated by AI agents from gaining root access and compromising user or client data [00:03:38].
Speed
Speed is paramount for AI sandboxes [00:03:48]. Arachis boasts fast boot times, currently under 7 seconds, which is significantly faster than a traditional VM (40 seconds on macOS) [00:03:55]. There is ongoing work to reduce this to under one second [00:04:02]. Snapshots are also very fast, taking single-digit seconds and continuously improving [00:04:08].
Port Forwarding
Arachis handles all necessary port forwarding, allowing easy access to code execution or browser use via a public URL and port, without manual configuration of IP tables or firewalls [00:04:15].
Easy Computer Use Workflows
Chrome is pre-installed in the sandbox, along with a VNC server, enabling easy graphical user interface (GUI) access for agent workflows [00:04:31].
Backtracking via Snapshot and Restore
Arachis supports snapshot and restore functionality, allowing agents to checkpoint their progress [00:04:48]. If multi-step workflows fail, agents can restore an old snapshot instead of starting from scratch [00:04:57]. This leads to more reliable, higher-order complex task execution by agents [00:05:01].
Dead Simple and Ubiquitous API
Arachis offers a Python API, a Golang client, an MCP server, and an OpenAPI-compatible YAML file [00:05:10]. This allows for client generation in any desired language [00:05:18].
Configurable with Docker Tooling
Users can customize the binaries and packages installed in the sandbox using existing Docker commands and a Dockerfile [00:05:24]. This provides complete control over the sandbox environment [00:05:33].
Architecture of Arachis
The high-level architecture of Arachis involves a REST server that spawns and manages microVM sandboxes [00:05:40]. Each sandbox runs a VNC server and a code server, with port forwarding exposing the VNC server for GUI access via a VNC client [00:05:47].
Arachis is tied to Linux because the microVM technology it uses relies on dev/kvm
, the Linux virtualization device [00:06:19].
Linux Sandboxing Background
Arachis’s choice of microVMs is rooted in various Linux sandboxing techniques:
Linux Execution Model
On Linux, a thread is the smallest unit of execution, represented by a task_struct
in the kernel’s scheduler run queue [00:08:21]. A process is a logical construct of multiple threads, sharing resources like the page table [00:08:42]. The kernel provides privileged access to hardware, requiring special instructions (system calls) to switch to kernel mode for privileged operations [00:08:56].
Containers
Containers address the problem of packaging an application’s dependencies with its core logic, allowing arbitrary user code to run on a machine [00:10:13]. Technically, a container is a collection of namespaces (e.g., process, mount, network) that abstract different resources, giving processes inside a bound view of their own resources [00:10:32]. While the host can peek into a child container’s namespace, the container cannot look upwards into the host’s namespace [00:11:08]. Cgroups are used alongside namespaces to control resource access (e.g., memory, CPU percentage) [00:11:41].
Container Security Flaws: Containers run as native processes on top of the host kernel [00:12:20]. A kernel vulnerability allows a malicious or buggy process within a container to attack the kernel, gain root access, and compromise the entire system [00:12:33].
Mitigation Techniques for Containers: To reduce the attack surface, containers can be “jailed” by restricting Linux capabilities (caps) and system calls they can invoke [00:13:19]. Syscalls can be filtered using seccomp
[00:14:01]. Libraries like minijail
can help in jailing and sandboxing containers [00:14:19]. However, sandboxing and jailing still have limits and can be bypassed [00:14:31].
Virtualization
Virtualization provides another primitive for running untrusted code [00:14:49]. Each Virtual Machine (VM) has its own guest user space and guest kernel, offering greater isolation compared to containers [00:14:53]. This results in a smaller attack surface to the host kernel [00:15:10].
Linux Virtualization (KVM): The process of spawning VMs is managed by a Virtual Machine Monitor (VMM), such as QEMU, CrossVM, or Firecracker [00:15:47]. The VMM communicates with dev/kvm
, a Linux kernel device that exposes the processor’s virtualization stack, to start VMs and grant access to privileged resources [00:16:01]. When a VM needs to access host resources (disk, network), it “VM exits” to the host. The VMM handles the request with the host kernel and sends the response back to the guest with a “VM resume” [00:16:56]. Minimizing VM exits and resumes is critical for performance [00:17:22].
MicroVMs
MicroVMs differ from traditional VMs in several ways:
- Security-First Design: Pioneered by the CrossVM project at Chrome OS, microVM VMMs are often written in memory-safe languages like Rust, reducing vulnerabilities from memory safety bugs in emulated devices [00:18:38]. They also jail emulated devices separately, restricting compromise of one device (e.g., block) from affecting others (e.g., network) [00:19:11].
- Lightweight and Fast Boot: The “micro” in microVM refers to the VMM process itself [00:19:41]. Unlike old VMMs (like QEMU) that support many architectures and emulated devices, microVMs (like CrossVM, Firecracker, Cloud Hypervisor) support only one or two architectures (Intel, ARM) and major devices [00:19:54]. This drastically reduces code paths at boot, leading to blazing-fast boot times and lower memory consumption at runtime [00:20:17].
Why Arachis Chose MicroVMs
Arachis selected microVMs as its final execution environment for AI sandboxes due to:
- Security: Essential for coding agents that may handle multi-tenant environments with LLM-generated code accessing sensitive client data [00:21:02].
- Fast Boot Times: Supports the need for quick tool calls and code generation [00:21:38].
- Snapshotting: MicroVMs enable fast snapshotting by simply dumping the entire guest memory, a process not as straightforward with containers or GVisor [00:21:43].
VMM Selection: Arachis specifically chose Cloud Hypervisor as its microVM VMM [00:23:04].
- Cloud Hypervisor vs. Firecracker: While Firecracker (underpinning AWS Lambda) has a fleshed-out REST API and better jailing, Cloud Hypervisor is a more general-purpose enterprise VMM [00:22:27].
- Benefits of Cloud Hypervisor: At the time of choice, it offered hot-plugging of devices (adding/removing RAM at runtime), GPU support, and snapshot support [00:22:44]. Furthermore, it is not controlled by a single company, fostering a more collaborative software project [00:22:58].
- GVisor Alternative: GVisor is another option, closer to a container in performance but with slightly better security. It allows easier GPU access but still carries risks of host kernel attacks [00:23:12].
Detailed Architecture Components
Storage / File System
To protect the root file system (rootFS) from untrusted code, Arachis employs a shared, read-only base layer (OverlayFS) for the rootFS, shared between sandboxes [00:25:03]. On top of this, each sandbox receives its own read-write layer where new files are created [00:25:13]. When a sandbox is snapshotted, only this read-write layer is persisted, optimizing storage and backup [00:25:31].
Networking
Each Arachis sandbox runs in a virtual machine with its own isolated networking setup [00:27:09]. This includes:
- Tap Device: Each sandbox gets a unique virtual networking interface [00:27:19].
- Linux Bridge: All tap devices are connected to a Linux bridge on the host server [00:27:32].
- Port Forwarding: Arachis automatically forwards ports from the host to the code server or VNC server within the sandbox, simplifying access [00:27:44]. This involves setting up bridge devices and complex firewall rules using Linux
iptables
commands [00:28:06].
Customization
Arachis sandboxes are customizable via Docker tooling [00:29:29]. The default Dockerfile is based on Ubuntu 22.04 and includes standard packages for agents, such as Chrome (booted via systemd), NodeJS, npm, and Python [00:29:46]. Users can modify this Dockerfile to include any desired binaries or packages [00:30:16].
Code Execution Server
Arachis is bundled with a code execution server running inside the sandbox [00:31:20].
- Files API: Allows uploading and downloading files to and from the sandbox [00:31:27].
- Command API: Takes a command, executes it, and returns the output or error in JSON format [00:31:39].
- The fact that this server runs within a secure guest VM increases confidence in exposing such functionality, unlike running it directly on the host OS [00:31:51].
Snapshotting
Snapshotting allows agents to save the entire running state of a sandbox, including guest memory and the read-write layer of the file system [00:33:30]. Any created files, spawned processes, or even open GUI windows are restored exactly as they were [00:33:39].
Snapshotting Process:
- Pause VM: The VM is paused by calling the VMM’s pause API [00:34:44].
- Dump Guest Memory: The snapshot API is called to dump the guest’s memory [00:34:50].
- Persist Read-Write OverlayFS: The thin read-write overlay file system is manually persisted to save all files created by the agent [00:34:57].
- Resume VMM: The VMM is resumed, allowing the sandbox to continue its operations from where it was paused [00:35:08].
This process enables agents to backtrack to a good snapshot if they fail, replan, and continue their workflow, leading to more reliable results [00:34:01].
How to Use Arachis
Arachis provides a straightforward Python SDK for interaction [00:35:54]. After self-hosting Arachis on infrastructure, users can pip install the arashis
package [00:35:57].
Using the SDK:
- Instantiate a sandbox manager with the Arachis server’s IP [00:36:01].
- List running VMs and their metadata (IP, ports) using
list_all
[00:36:09]. - Start a sandbox with
start_sandbox
[00:36:17]. - Run commands and retrieve output/errors [00:36:19].
- Snapshot a VM with a simple
snapshot
call, providing a snapshot ID [00:36:26]. - Destroy a VM when done [00:36:30].
- Restore a checkpoint by calling
restore
with the VM name and snapshot ID [00:36:35].
Demo Example
A demonstration shows Claude Desktop using Arachis via its MCP server to create a Google Docs clone with built-in collaboration [00:37:04]. Claude pipes commands directly into the Arachis sandbox [00:37:41]. The demo highlights Arachis’s networking setup, enabling real-time collaborative features [00:37:48]. The sandbox is snapshotted, and then a new feature (dark mode) is added and verified [00:37:51]. The ability to restore to the previous snapshot without dark mode demonstrates the backtracking functionality [00:38:19].
Ongoing Work
Current development focuses on:
- Achieving sub-1-second boot times [00:39:14].
- Enhancing snapshot and persistence support, potentially by moving to
btrfs
for incremental snapshots [00:39:24]. - Improving sandbox bin-packing on a single server through dynamic memory and resource management, such as ballooning or hot-plugging/removal of memory at runtime [00:39:35].