From: aidotengineer

Working with OCaml in AI projects, particularly with large language models (LLMs), presents unique challenges in AI development and challenges in building AI applications due to the language’s obscure nature and the specific development environment at Jane Street [00:00:53]. John Kzi from Jane Street’s AI Assistant team highlights several key difficulties [00:00:19].

OCaml as a Development Platform

OCaml is described as a functional, powerful, but incredibly obscure language, primarily built in France [00:01:08]. Its most common applications include theorem proving, formal verification, and writing programming languages [00:01:17]. At Jane Street, OCaml is used for nearly everything [00:01:28]:

  • Web applications are written in OCaml and transpiled to JavaScript using JS of OCaml [00:01:32].
  • Vim plugins are written in OCaml using vaml [00:01:47].
  • FPGA code is written in OCaml using Hardcaml instead of Verilog [00:01:58].

This pervasive use of OCaml creates significant challenges for adopting off-the-shelf AI tooling [00:00:55].

According to John Kzi, the difficulties stem from a few core issues [00:02:10]:

Model Limitations with OCaml

  • Data Scarcity: LLMs are generally not very good at OCaml [00:02:14]. This is not the fault of AI labs but a byproduct of the limited amount of OCaml data available for training [00:02:20]. The amount of OCaml code within Jane Street likely exceeds the total combined amount existing outside its walls [00:02:26].

Jane Street’s Unique Development Environment

The internal development choices at Jane Street, influenced by OCaml, further complicate AI integration [00:02:37]:

  • Custom Tooling: The company has built its own build systems, distributed build environment, and a code review system called Iron [00:02:42].
  • Monorepo and Version Control: All software is developed on a giant monorepo application, stored in Mercurial instead of Git [00:02:52].
  • Editor Usage: A significant portion (67%) of the firm uses Emacs, which differs from more common editors like VS Code [00:03:02].

Ambitious Integration Goals

Jane Street aims to apply LLMs to various parts of their development flow, such as resolving merge conflicts, building feature descriptions, or identifying code reviewers [00:03:20]. This requires deep integration without being hampered by system boundaries [00:03:34].

Challenges in AI Development and Training Custom Models

Training custom models for OCaml is expensive, time-consuming, and prone to errors [00:04:07]. Initial attempts to replicate Meta’s Code Compose project, which fine-tuned a model for Hack (a language similar to OCaml in its primary use at one company), proved difficult [00:04:24].

Data Collection and Shaping

  • Need for Specific Training Data: Fine-tuning a model for OCaml requires models to see many examples in the shape of the desired question [00:05:16]. The goal was to generate multi-file diffs (e.g., test, .ml, .mli files) given a natural language prompt, ensuring they apply cleanly and type-check [00:05:30].
  • Inadequacy of Existing Data:
    • Features/Pull Requests: While containing human descriptions and code diffs, feature descriptions differ significantly from short, in-editor prompts (“fix that error”) [00:07:01]. They are also often too large (500-1000 lines), requiring automated ways to break them into smaller components [00:07:20].
    • Commits: Commits at Jane Street are primarily used as checkpoints without descriptions and are not isolated changes [00:07:56].
  • Workspace Snapshotting: The solution for data collection involves snapshotting developer workstations (e.g., every 20 seconds) along with build statuses [00:08:17]. A “green to red to green” pattern often indicates an isolated change, and capturing the build error at the “red” state with the “red to green” diff provides valuable training data for recovery from mistakes [00:08:36]. LLMs are used to generate detailed descriptions from these diffs, which are then filtered down to human-like prompt length [00:09:07].

Reinforcement Learning and Evaluation

  • Defining “Good Code”: For OCaml, “good code” means it parses correctly [00:09:47], type-checks (due to static typing) [00:09:59], compiles, and passes tests [00:10:15].
  • Code Evaluation Service (CES): To align the model with human-defined good code, a Code Evaluation Service (CES) was built [00:10:37]. CES is a fast build service that pre-warms builds, applies diffs from the model, and reports whether the build status turns red or green [00:10:43]. This iterative process helps the model write code that compiles and passes tests [00:11:07]. This same setup is also used for evaluating model performance [00:11:20].
  • Ensuring Meaningful Evaluations: Training can lead to “catastrophic but hilarious results” if evaluations are not meaningful [00:11:38]. An example given was a code review model that, after months of training, responded with “I’ll do it tomorrow,” as it was trained on human examples including such phrases [00:12:12]. This highlights the importance of robust evaluation to prevent wasted time and money [00:12:24].

Technical Challenges in AI Agent Development: Editor Integrations

Exposing these OCaml-aware models to developers requires robust editor integrations. Key considerations for building these integrations included [00:12:42]:

  1. Code Reusability: Avoiding rewriting context and prompting strategies for each of the three supported editors: Neovim, VS Code, and Emacs [00:12:48].
  2. Flexibility: The ability to swap models or prompting strategies easily, anticipating the need for fine-tuned models [00:13:02].
  3. Metrics Collection: Gathering real-world data on latency and diff application success to ensure the diffs are meaningful to users [00:13:17].

AI Development Environment (AID) Architecture

To address these challenges in AI agent development, Jane Street developed AID (AI Development Environment), which acts as a sidecar application on developers’ machines [00:13:34]:

  • AID handles prompt and context construction, and build status integration [00:13:44].
  • Thin layers are written on top of AID for each editor [00:13:49].
  • This architecture allows changes to AID without requiring editor restarts, ensuring all developers get the most recent copy [00:14:00].
  • Examples include a visual sidebar integration in VS Code and a markdown buffer integration in Emacs for diff requests [00:14:15].

AID’s pluggable architecture allows for swapping in new models, changing context building, adding support for new editors, and integrating domain-specific tools [00:14:58]. It also facilitates A/B testing different approaches to determine which yields higher acceptance rates [00:15:28]. This investment pays off over time, adapting to frequent changes in LLMs from a single point [00:15:39].

Conclusion

The challenges in creating effective AI agents and tools for OCaml at Jane Street stem from the language’s obscurity and the company’s highly customized development environment. By building custom models, advanced data collection techniques, and flexible evaluation and integration services like AID, Jane Street addresses these unique hurdles, laying a strong foundation for continued AI development in OCaml [00:16:16].