Measuring ROI and impact of AI in companies

From: aidotengineer

Many large companies face the challenge of proving the return on investment (ROI) and measurable impact of their AI adoption initiatives [00:00:50]. Often, initial AI purchases are driven by “fomo” (fear of missing out) or executive mandates, leading to questions from finance departments about tangible benefits [00:01:09]. Booking.com, a major online travel agency with over 3,000 developers and a highly data-driven culture, has been at the forefront of addressing this challenge by systematically measuring the impact of AI solutions on its software development processes [00:01:24].

The Challenge: A Bloated Codebase

Booking.com’s core mission is to make experiencing the world easier, while its developer experience team aims to clear paths for developers to do their best work [00:01:57]. The company’s success is rooted in extensive experimentation, primarily through A/B tests [00:02:53]. Over decades, this approach led to a massive, bloated codebase riddled with persistent feature flags and dead code, resulting in increased cycle times and developers spending over 90% of their time on “toil” rather than innovative features [00:03:08], [00:03:54]. This environment highlighted the need to unlock developers’ minds from legacy craft to focus on new features and user problems [00:05:07].

Sourcegraph, a partner in this endeavor, aims to make building software at scale tractable, accelerating the developer inner loop, augmenting human creativity, and automating “BS” from the outer loop [00:05:21]. Their tools include Code Search (a “Google for your code”), large-scale refactoring tools, and Cody, an AI coding assistant [00:05:29].

Evolving AI Adoption and Metrics

Booking.com’s journey with AI began with adopting Sourcegraph’s Code Search over two years ago, which significantly improved the ability to navigate their large codebase [00:06:15]. About a year later, in January, they started experimenting with Cody, leveraging its context-awareness derived from Code Search [00:06:34].

Initially, all 3,000 developers were given access to Cody, but usage varied, with some stopping due to perceived lack of value [00:07:13]. Early limitations included relying on a single LLM and token limits [00:07:32]. Working with Sourcegraph, Booking.com removed guardrails, enabling multiple LLMs per developer, recognizing that different LLMs had “expertise” better suited for tasks like excavating bloated codebases versus developing new services [00:07:44].

The Shift to Robust KPIs

The initial metric of “hours saved” proved statistically irrelevant and was considered “semi-BS” [00:08:34]. By October, Booking.com had defined new, measurable KPIs (Key Performance Indicators) and metrics that could demonstrate results within a year [00:10:04]:

Lead Time for Change:
- Short-term metric: Time to review Merge Requests (MRs) [00:10:40].
- Observed Impact: Developers using Cody daily (12+ days a month) shipped 30%+ more MRs, and their MRs were “lighter” with less code [00:09:29], [00:10:43].
Quality:
- Mid-term metrics: Predicting new vulnerabilities based on historical data and increasing test coverage, especially for legacy code during re-platforming [00:11:00].
Codebase Insights:
- Long-term metrics: Tracking unused code, lingering feature flags, and non-performant code [00:11:26].
- Ultimate Goal: Reducing the time to re-platform their codebase from years to months [00:11:40].

Automating Toil with AI Agents

A crucial part of Booking.com’s success lies in building AI agents to automate toil in the software development lifecycle [00:05:51]. Developers started customizing prompts and building longer chain automations, leading to the creation of these agents [00:12:00].

Key agents developed through joint hackathons with Sourcegraph include:

GraphQL Query Generator: This agent addresses the challenge of a massive GraphQL API (over a million tokens long) that doesn’t fit into standard LLM context windows [00:12:46]. It intelligently searches the schema, identifies relevant nodes, walks up the schema tree to pull in parent nodes, and then generates coherent GraphQL queries, avoiding the “garbage” output of naive approaches [00:13:03].
Automated Code Migration: Aimed at accelerating re-platforming efforts, this agent tackles legacy functions with over 10,000 lines of code [00:14:02]. By leveraging Code Search, structured meta-prompts, and dividing the codebase into smaller, conquerable bits, it allows developers to quickly define and understand the scope of migration problems [00:14:16]. A problem that previously took months for developers to size could be defined within two days using this agent [00:15:06].
Tailored Code Review Agent: Recognizing that off-the-shelf AI code review tools lack the customization needed for specific organizational rules and guidelines, Sourcegraph built an interface to productize the process of building a team- and organization-tailored review agent [00:16:11]. This agent consumes rules defined in a simple flat file, applies relevant ones to modified files in a PR, and selectively posts precise comments to avoid noise [00:16:40].

The Future: Self-Healing Services and Declarative Coding

Looking forward, the vision is to create self-healing services where errors are anticipated and fixed directly within the IDE, effectively “shifting left” current CI pipeline errors [00:17:30]. This involves using the context of the codebase and developer-created prompts to automate questions to the server, extracting knowledge and implementing fixes [00:17:54].

This approach leverages “declarative coding” to address the perennial problem of software becoming a “victim of its own success,” where accumulating technical debt from feature requests and bug reports leads to a loss of code cohesion and standards [00:18:46]. With declarative coding, senior engineers, architects, and leaders can define constraints and rules that are automatically enforced throughout the codebase at review time and within the editor, for code written by humans or AI [00:19:07]. This includes compliance rules and other non-feature-facing requirements that often consume significant developer time [00:19:28].

The Importance of Education

A critical factor in Booking.com’s success has been comprehensive developer education [00:19:52]. Many developers initially stopped using Cody due to a “pure lack of knowledge” on how to effectively work with LLMs and craft appropriate prompts [00:14:47]. Through workshops and hackathons, Booking.com educated developers, showing them the value and enabling them to experiment with the tools. This hands-on approach transformed initial skeptics into “daily users,” helping to defend the observed 30%+ increase in development speed [00:19:56]. For any company embarking on AI adoption, “education” is the primary takeaway [00:20:19].

Tubegraph

Explorer

Table of Contents