From: aidotengineer
Maintaining large and complex codebases presents significant challenges for organizations, impacting developer productivity, efficiency, and the overall quality of software development [00:00:20].
Causes of Codebase Bloat
At companies like Booking.com, a major contributor to codebase bloat is their obsession with experimentation and being data-driven [03:00:00] [03:03:00]. As new features are brought to users, experiments and feature flags are added to the codebase [03:08:00]. Often, these experiment flags or dead code persist in the codebase for decades, leading to an “extremely bloated” state [03:21:00] [03:27:00].
Impact on Development
The consequences of a bloated codebase are substantial:
- Increased Cycle Times As the codebase grows, software development cycle times become longer [03:54:00] [03:56:00].
- High Toil Percentage Developers may spend over 90% of their time on “toil”—debugging and working on the code base rather than on innovative features [04:02:00] [04:07:00].
- Decreased Developer Satisfaction Surveys reveal that it becomes increasingly difficult for developers to perform their work [04:18:00] [04:29:00].
- Misallocation of Talent Brilliant engineers may be “destroyed by decade-long dead feature flag migrations,” instead of focusing on new features and user problems [04:34:00] [04:38:00] [05:07:00].
- Erosion of Standards Over time, as more contributors are added to the codebase, there is a loss of cohesion, vision, and the maintenance of established coding standards [01:57:00] [01:59:00].
This phenomenon, described as successful software becoming “a victim of its own success,” is due to the accumulation of technical debt as businesses prioritize new features and bug fixes to remain competitive [01:46:00] [01:57:00].
Solutions and Impact Measurement
Companies like Sourcegraph aim to make building software at scale “tractable” [05:21:00]. Solutions include:
- Code Search Tools Tools like Sourcegraph’s Code Search help developers navigate and understand large, messy codebases [05:29:00] [06:15:00].
- AI Coding Assistants AI coding assistants like Cody, which are context-aware and tuned for large codebases, can accelerate the developer inner loop and augment human creativity [05:40:00] [05:47:00] [06:02:00].
- Software Development Agents The primary focus is on agents that automate “toil” out of the software development lifecycle [00:43:00] [05:51:00] [06:06:00]. These agents aim to automate large-scale refactoring and code migrations [05:36:00] [05:38:00]. Examples of such agents include:
- GraphQL Query Generation An agent that searches massive GraphQL schemas (over a million tokens) to find relevant nodes and generate coherent queries, overcoming context window limitations and hallucinations common with naive LLM usage [01:30:00] [01:42:00].
- Automated Code Migration Agents designed to handle legacy functions with over 10,000 lines of code, speeding up replatforming efforts by dividing the codebase into smaller, conquerable bits [01:04:00] [01:10:00] [01:13:00] [01:21:00]. This drastically reduces the time needed to understand problem scope from months to days [01:15:00] [01:19:00].
- AI-Powered Code Review Customizable agents that consume organization-specific rules and guidelines (defined in a simple flat file format) to selectively post comments during code review, ensuring precision and reducing noise [01:59:00] [01:02:00] [01:07:00].
Organizations are measuring the impact of these solutions using KPIs like:
- Lead Time for Change [01:17:00]
- Code Quality (e.g., reducing vulnerabilities, increasing test coverage, especially for legacy code) [01:21:00] [01:26:00].
- Codebase Insights (e.g., tracking unused code, lingering feature flags, and non-performant code) [01:28:00] [01:35:00].
The ultimate goal for Booking.com is to reduce the time to replatform their codebase from years to months [01:40:00] [01:45:00].
The Future of Codebase Maintenance
The potential for self-healing services and the ability to declare service rules are emerging as key areas [01:29:00] [01:33:00] [01:50:00]. By anticipating CI pipeline errors and shifting checks left into the IDE, developers can receive immediate fixes [01:38:00] [01:47:00]. Declarative coding allows senior engineers and architects to define and enforce constraints and rules throughout the codebase, both at review time and within the editor, for code written by humans or AI [01:07:00] [01:09:00]. This addresses the long-standing problem of maintaining standards and cohesion in large, evolving software projects [01:05:00] [01:07:00].
A crucial factor in successful adoption and achieving positive impacts is education [01:52:00]. By educating developers through workshops and hackathons, companies can transform them into daily users who are passionate about the tools and contribute to measurable improvements in speed [01:54:00] [02:09:00].