Building AI agents in the enterprise SDLC

From: aidotengineer

This article details the partnership between Sourcegraph and Booking.com to develop software development agents aimed at automating toil and demonstrating measurable ROI within large enterprise codebases [00:00:38]. The collaboration addresses common challenges faced by large companies in adopting AI and proving its impact [00:01:11].

The Challenge of Large Enterprise Codebases

Booking.com, as one of the world’s largest online travel agencies with over 3,000 developers [00:02:28], faces significant challenges with its codebase. The company processes over 250 million merge requests and 2.5 million CI jobs annually [00:02:43].

A core issue stems from its data-driven, experimentation-obsessed culture [00:02:53]. Experiments, often in the form of A/B tests, lead to feature flags and dead code accumulating in the codebase over time [00:03:08]. This results in:

An extremely bloated codebase [00:03:28].
Increased cycle times and longer debugging periods [00:03:56].
Developers spending over 90% of their time on “toil” [00:04:07].
Frustration among developers due to the difficulty of working with the code [00:04:25].

“I’ve seen the best developer minds of my generation destroyed by decade long dead feature flag migrations.” [00:04:34]

This situation diverts brilliant engineering minds from working on new features and user problems to maintaining legacy craft [00:05:07].

Sourcegraph’s Contribution

Sourcegraph’s mission is to make building software at scale tractable [00:05:21]. Their key products include:

Code Search: A “Google for your code” that helps developers understand codebases [00:05:29]. Booking.com adopted Code Search over two years ago with great success [00:06:15].
Cody: An AI coding assistant that is context-aware and tuned for large, messy codebases [00:05:40]. Booking.com began experimenting with Cody a year ago, leveraging its integration with Sourcegraph Search [00:06:34].

The unifying theme of Sourcegraph’s products is to accelerate the developer inner loop, augment human creativity, and automate toil from the outer loop of the software development life cycle [00:05:57].

Booking.com’s AI Journey and Measurable Impact

Booking.com’s journey with GenAI innovation and AI agents involved a rapid evolution over a year [00:07:06]:

Early Stages (January)

Initial rollout of Cody to all 3,000 developers [00:07:13].
Challenges included a single LLM choice, token limits, and a lack of perceived value from some users [00:07:32].
Partnership with Sourcegraph enabled the removal of guardrails, allowing multiple LLMs per developer, which was crucial as LLMs demonstrated expertise in different code contexts (e.g., legacy vs. new service development) [00:07:44].

Evolution and Training (July)

Developer training became critical, transforming initial skeptics into “daily users” [00:08:17].
Shifted from “hours saved” (a statistically irrelevant metric based on limited research) to more robust metrics [00:08:34].

Defining KPIs and Impact (October-November)

New KPIs defined in October [00:09:19]:
- Lead Time for Change:
  - Short-term: Time to review Merge Requests (MRs). Daily Cody users ship 30%+ more MRs, and their MRs are lighter (less code) [00:10:40].
- Quality:
  - Mid-term: Predicting and eliminating vulnerabilities in the codebase [00:11:00].
  - Long-term: Increasing test coverage, especially for legacy code during re-platforming efforts [00:11:15].
- Codebase Insights:
  - Mid-term: Tracking unused code, lingering feature flags, and non-performant code [00:11:26].
- Modernization:
  - Long-term: Reducing the time to re-platform the codebase from years to months [00:11:37].
November Findings: Traces showed developers using Cody daily were 30%+ faster [00:09:26].
An API layer was created for Cody, enabling integration with existing tools like Slack and Jira, extracting its utility beyond the IDE [00:09:46].

Specific AI Agent Use Cases

The shift from coding assistants to AI agents arose from engineers’ desire to compose LLM calls into longer chain automations [00:12:00]. A joint hackathon was crucial in building initial agents [00:12:29].

1. GraphQL Schema Agent

Problem: Booking.com’s GraphQL API schema is over a million tokens long, making it impossible to fit into current LLM context windows, leading to hallucinations if naively integrated [00:12:48].
Solution: An agent that searches the large schema, identifies relevant nodes, and agentically reasons which parent nodes to pull in. This process generates coherent and accurate responses [00:13:05].

2. Automated Code Migration Agent

Problem: Migrating legacy functions, some with over 10,000 lines of code, for re-platforming efforts was a massive, time-consuming task [00:14:02]. Developers spent months just understanding the problem size [00:15:06].
Solution:
- Leveraged Code Search and structured meta-prompts [00:14:16].
- Employed a “divide and conquer” approach for smaller code bits [00:14:21].
- Crucial: Pairing with experts significantly accelerated the process. During a two-day hackathon, they were able to precisely define the problem size and identify call sites, which had previously taken months [00:15:15]. This highlights that lack of knowledge in working with LLMs was a major impediment [00:14:47].

3. Customizable AI Code Review Agent

Problem: While many off-the-shelf AI code review tools exist, they often lack the customization needed for enterprise-specific rules, guidelines, and a long tail of organizational requirements [00:16:11].
Solution: An interface for productizing the process of building a review agent tailored to a specific team or organization [00:16:32].
- Users define rules in a simple flat file format [00:16:41].
- The agent consumes these rules and selectively applies relevant ones to modified files in a Pull Request [00:16:50].
- It posts precise, targeted comments, optimizing for precision over recall to avoid noise [00:17:03].

The Future: Self-Healing Services and Declarative Coding

The goal is to move towards self-healing services [00:17:50]. This involves:

Anticipating CI pipeline errors and shifting them left to the IDE [00:17:38].
Presenting errors with immediate fixes [00:17:45].
Leveraging developer-created prompt libraries to automate questions to the server and extract knowledge from the codebase [00:17:58].

This approach has the potential to solve the “mythical man-month” problem, where successful software becomes a victim of its own success due to accruing technical debt and losing code cohesion with more contributors [00:18:46]. With declarative coding, senior engineers and architects can define constraints and rules that must hold throughout the codebase [00:19:07]. These rules can then be enforced automatically at code review time and within the editor, regardless of whether the code is written by a human or AI [00:19:19]. This also helps with compliance rules and other non-feature-related developer toil [00:19:28].

Key Takeaway: Education is Paramount

The most important factor in the success of AI agent adoption and demonstrating value has been education [00:19:52]. By educating developers and providing hands-on workshops and hackathons, they become passionate daily users, leading to the observed 30%+ increase in speed [00:20:01].

Tubegraph

Explorer

Table of Contents