Developing and using software automation tools

From: aidotengineer

Biang, CTO and co-founder of Sourcegraph, and Bruno Pasos, Head of Product for developer experience at booking.com, discuss their partnership in building software development agents to automate toil within booking.com, aiming for measurable ROI and impact [00:00:38].

A common challenge for large companies adopting AI is measuring its ROI and impact, a question booking.com is proactively addressing [00:01:11].

Booking.com’s Context and Challenges

Booking.com aims to make travel easier for everyone, and internally, to clear paths for its developers to do their best work [00:01:57]. As one of the largest online travel agencies, it serves approximately 1.5 million room nights and employs over 3,000 developers [00:02:16]. The company handles over 250 merge requests and 2.5 million CI jobs annually [00:02:43].

Booking.com is extremely data-driven, with its success rooted in experimentation and being obsessed with data [00:02:53]. This approach, primarily through A/B tests, leads to a significant accumulation of experiment flags and dead code within the codebase over decades [00:03:08]. This bloat increases cycle times, resulting in developers spending over 90% of their time on “toil” [00:03:54]. Developer surveys confirm the increasing difficulty of working with the codebase [00:04:18].

Sourcegraph’s Solutions

Sourcegraph’s mission is to make building software at scale tractable [00:05:21]. Its products include:

Code Search – Described as “Google for your code,” it allows human developers to find and understand code [00:05:29].
Large-scale refactoring and code migration tools [00:05:36].
Cody – An AI coding assistant, context-aware and tuned for large, messy codebases [00:05:40].
Agents – Tools built to automate toil out of the software development life cycle [00:05:51].

The unifying theme across Sourcegraph’s products is to accelerate the developer inner loop, augment human creativity, and automate as much “BS” out of the outer loop as possible [00:05:57].

Partnership and Evolution of AI Adoption

Booking.com began using Sourcegraph Search over two years ago, finding it highly successful for navigating their large codebase [00:06:15]. Around a year ago, they started experimenting with Cody, leveraging its integration with Sourcegraph Search for context [00:06:34]. The next step is building agents using both Cody and Sourcegraph Search [00:06:57].

Timeline of AI Tool Adoption and Measurement

The journey of adopting AI tools at booking.com has been rapid:

January (Last Year): Cody was made available to all 3,000 developers [00:07:10]. Initial challenges included limited LLM choices and token limits [00:07:32]. Partnering with Sourcegraph, they removed guardrails, enabling multiple LLMs per developer, which was crucial as different LLMs showed expertise in different code contexts (e.g., excavating old code vs. developing new features) [00:07:44].
July: Developer training on Cody began. This was “incredibly important” as developers who initially saw no value, after training, became daily users [00:08:14].
January-October: The initial metric for success, “hours saved,” was found to be statistically irrelevant and based on limited research [00:08:34]. This led to brainstorming for more statistically relevant metrics [00:09:10].
October: New KPIs were defined to measure the impact of GenAI within a one-year timeframe [00:09:19].
November: Traces showed developers using Cody daily (12+ days a month) were 30% faster [00:09:29]. An API layer was created in front of Cody, allowing for creative integrations with tools like Slack and Jira, extracting functionality beyond the IDE [00:09:46].

Key Performance Indicators (KPIs) for AI Adoption

Booking.com defined four core KPIs for measuring GenAI impact over a year:

Lead time for change [00:10:17]:
- Short-term: Time to review Merge Requests (MRs). Daily Cody users ship 30% more MRs than non-users, and their MRs contain less code [00:10:40].
Quality [00:10:21]:
- Mid-term: Reduce vulnerabilities by providing LLMs with past vulnerability context from the codebase to predict new ones [00:11:00].
- Mid-term: Increase test coverage, especially for legacy code, to ensure new platforms pass existing tests [00:11:15].
Codebase insights [00:10:21]:
- Mid-term: Track unused code parts, lingering feature flags, and non-performant code [00:11:26].
Modernization [00:10:26]:
- Long-term: Reduce the time to re-platform their codebase from years to months [00:11:37].

Building and Utilizing Software Agents

Engineers using Cody began to experiment with the underlying APIs, leading to a desire to compose calls into longer chain automations, now called “agents” [00:11:50]. Initial pitfalls included managing expectations about LLM capabilities [00:12:12]. A joint hackathon was held to build these agents [00:12:27].

Examples of AI Agents Developed

GraphQL Query Generator [00:13:37]:
- Booking.com has a massive GraphQL API schema (over a million tokens) that doesn’t fit into standard LLM context windows [00:12:48].
- The agent uses Sourcegraph Search to find relevant nodes within the schema and then agentically walks up the tree to pull in relevant parent nodes [00:13:05].
- This process allows the LLM to reason about which parts of the schema to use, generating coherent responses where naive approaches would result in “garbage” due to hallucinations [00:13:22].
Automated Code Migration Agent [00:14:02]:
- Targets legacy functions with over 10,000 lines of code to accelerate re-platforming efforts [00:14:07].
- Combines Sourcegraph Search, structured meta-prompts, and the concept of dividing the code into smaller, manageable bits [00:14:16].
- In-person pairing with experts was crucial, as developers often lacked knowledge on how to effectively use LLMs and craft prompts [00:14:30].
- Developers spent months trying to understand the scope of the problem, but within two days of a hackathon, the agent could define and understand call sites, identifying low-hanging fruits [00:15:05].
Code Review Agent [00:15:52]:
- Addresses the universal practice of code review in enterprises [00:15:56].
- Unlike generic AI code review tools, this agent is highly customizable to an organization’s specific rules and guidelines [00:16:17].
- Users define rules in a simple flat file format, which the agent consumes [00:16:41].
- The agent applies relevant rules to modified files in a PR and selectively posts comments, optimizing for precision to avoid noise [00:16:50].

Future Vision: Self-Healing Services and Declarative Coding

The goal is to develop self-healing services by shifting error detection left into the IDE, providing instant fixes based on service rules and prompts [00:17:30]. This leverages developer-created prompt libraries to automate inquiries to the server and extract knowledge from the codebase [00:17:58].

This approach has the potential to address a long-standing problem in software development: successful software becomes a victim of its own success due to accumulated technical debt and loss of cohesion [00:18:32]. With declarative coding, senior engineers and architects can define constraints and rules that are automatically enforced throughout the codebase, both during code review and within the editor, for code written by humans or AI [00:19:07]. This can also enforce compliance rules that don’t directly feed new features to end-users [00:19:28].

The Importance of Education

A key takeaway from this past year of partnership is the critical role of education [00:19:50]. Hand-holding entire business units, showing them the value of the tools, and enabling them to experiment through workshops and hackathons (even within two days) transforms them into passionate daily users [00:19:52]. This education is vital to achieve the 30% plus increase in speed [00:20:13].

Tubegraph

Explorer

Table of Contents