From: aidotengineer
Introduction
The partnership between Sourcegraph, a company building developer tools for large codebases, and Booking.com, a leading online travel agency, aims to build software development agents to automate toil within Booking.com’s operations [00:00:40]. A key focus of this collaboration is demonstrating the real Return on Investment (ROI) and impact of these AI tools, addressing a common challenge many companies face when adopting AI [00:01:11].
The Challenge at Booking.com
Booking.com’s team is dedicated to clearing the path for their developers to do their best work [00:02:02]. The company operates at a vast scale, serving 1.5 million room nights and employing over 3,000 developers [00:02:21]. Technically, they handle over 250 million merge requests and 2.5 million CI jobs annually [00:02:43].
Booking.com’s culture is deeply data-driven, with an obsession for experimentation, primarily through A/B tests [00:02:53]. This dedication to experimentation has led to a significant problem: as new features are pushed, old experiment flags and dead code accumulate, making the codebase “extremely bloated” over decades [00:03:08]. This bloat results in increased cycle times [00:03:54] and developers spending over 90% of their time on toil, such as debugging and maintenance [00:04:02]. This represents a significant challenge in software maintenance and bug fixing [00:04:02]. The aim is to free up brilliant developer minds from mundane tasks to focus on creating new features and solving user problems [00:05:07].
Sourcegraph’s Solutions
Sourcegraph’s mission is to make building software at scale tractable [00:05:21]. Their products include:
- Code Search: Described as a “Google for your code,” it allows developers to quickly find and understand code [00:05:29].
- Large-scale Refactoring and Code Migration Tools [00:05:36].
- Cody: An AI coding assistant that is context-aware and tuned for large, messy codebases [00:05:40].
- AI Agents: Tools built to automate toil out of the software development lifecycle [00:05:51].
The unifying theme of Sourcegraph’s tools is to accelerate the developer inner loop, augment human creativity, and automate as much “BS” out of the outer loop as possible [00:05:57].
Booking.com’s Journey with AI
Booking.com began using Sourcegraph Search over two years ago with great success, improving the ability to navigate their bloated codebase [00:06:15].
In January of the previous year, Booking.com started experimenting with Cody, which leveraged Sourcegraph Search for context [00:06:37]. Initially, all 3,000 developers were given access [00:07:13]. Early feedback showed some developers didn’t see value, which intrigued the team [00:07:25].
Key Milestones and Learnings:
- Early Challenges: Limited by a single LLM choice and token limits [00:07:33].
- Partnership and Flexibility: Sourcegraph enabled Booking.com to remove guardrails and offer multiple LLMs per developer, recognizing that LLMs have different areas of expertise for tasks like excavating old code versus developing new features [00:07:44]. This demonstrates an integration of AI into development environments and editors.
- Training and Adoption: By July, training developers became crucial. Those who received training started using Cody daily, becoming “daily users” [00:08:14]. This highlights the importance of developing and using software automation tools effectively.
- Measuring Impact:
- Initial metric, “time saved,” was found to be statistically irrelevant and based on limited research [00:08:34].
- By October, new, statistically relevant KPIs were defined [00:09:19].
- By November, traces showed daily Cody users were 30% faster [00:09:29].
- An API layer was created for Cody, allowing integration with tools like Slack and Jira, extending its use beyond the IDE [00:09:46]. This showcases the impact of AI on development workflow.
Defined KPIs for Measuring AI Impact
Booking.com established four key performance indicators to measure AI impact within a year:
- Lead Time for Change [00:10:21]
- Short Term (Metrics observed): Developers using Cody daily shipped 30% more Merge Requests (MRs) [00:10:40]. These MRs also contained less code, though the implications are still being analyzed [00:10:50].
- Quality [00:10:22]
- Mid-Term (Aspirations): Predicting new vulnerabilities or identifying lingering ones by providing codebase context [00:11:00]. Increasing test coverage, especially for legacy code during re-platforming [00:11:15].
- Codebase Insights [00:10:22]
- Mid-Term (Aspirations): Tracking unused parts of the codebase, lingering feature flags, and non-performant code [00:11:26].
- Modernize (Re-platforming Time) [00:10:26]
- Long-Term Goal: Reduce the time to re-platform their codebase from years to months [00:11:37].
AI Agents in Action
As developers started using AI coding assistants, they also began experimenting with the underlying APIs, leading to a desire to build longer chain automations, or “agents” [00:11:50].
A joint hackathon between Sourcegraph and Booking.com yielded significant breakthroughs:
-
GraphQL Agent:
- This agent generates GraphQL queries for Booking.com’s massive GraphQL API, which is over a million tokens long and doesn’t fit into a single LLM context window [00:12:48].
- The system uses Sourcegraph Search to find relevant nodes within the schema and then agentically figures out which ones are relevant, pulling in necessary parent nodes [00:13:05].
- This approach significantly reduced hallucinations and led to far better results compared to naive implementations [00:13:40].
-
Automated Code Migration Agent:
- This agent targets legacy functions with over 10,000 lines of code to accelerate re-platforming efforts [00:14:04].
- It leverages Sourcegraph Search, structured meta-prompts, and the concept of dividing the codebase into smaller, conquerable bits [00:14:16].
- Pairing with experts to provide knowledge on effective LLM prompting was crucial [00:14:30].
- A problem that previously took developers months to understand and size was defined and started to yield “low-hanging fruits” within two days during the hackathon [00:15:06].
-
AI Code Review Agent:
- While many startups offer AI code review, Booking.com found that enterprise code review is highly specific to the organization, with a long tail of internal rules and guidelines [00:16:17]. Off-the-shelf tools often lack sufficient customizability [00:16:29].
- The built interface allows organizations to define a set of rules for their code in a simple flat file format [00:16:41].
- The agent consumes these rules, applies relevant ones to modified files in a Pull Request (PR), and selectively posts highly precise comments tuned to those rules [00:16:48].
The Future of AI in Software Development
The goal is to move towards self-healing services by declaring rules for a service and shifting error anticipation and fixes left into the IDE [00:17:30]. This envisions AI integration that provides errors and fixes directly within the development environment [00:17:40].
This approach has the potential to solve a long-standing problem in software development: the inevitable accumulation of technical debt as successful software grows [00:18:32]. With declarative coding, senior engineers and architects can define constraints and rules that must hold across the codebase [00:19:07]. These rules can then be enforced during code review and directly within the editor, whether the code is written by humans or AI [00:19:19]. This can also apply to compliance rules and other non-feature-adding tasks [00:19:28].
The Importance of Education
The most important lesson from this year-long collaboration is the power of education [00:19:52]. By educating developers and providing hands-on workshops and hackathons, teams became passionate about AI tools and transformed into daily users [00:19:54]. This direct experience led to the validated 30%+ increase in development speed [00:20:13]. This demonstrates the critical role of turning development experiences into organized knowledge through training.