AI agents in DevOps

From: aidotengineer

Diamond, an AI veteran with 15 years in the field, shares insights on building AI agents at DataDog, specifically focusing on the development of a “DevOps engineer who never sleeps” [00:00:22]. His background includes work at Microsoft Cortana, Amazon Alexa, Meta (PyTorch), and an AI startup focused on a DevOps assistant [00:00:54]. DataDog’s current endeavor is “Bits AI,” an AI assistant designed to help with DevOps challenges [00:01:04].

DataDog and AI: A Historical Perspective

DataDog functions as an observability and security platform for cloud applications, primarily focused on helping users observe system happenings and take action, making systems safer and more DevOps-friendly [00:01:22]. DataDog has been shipping AI since approximately 2015, integrated into features like proactive alerting, root cause analysis, impact analysis, and change tracking, though not always overtly presented as “AI products” [00:01:46].

The Current Era Shift in AI

A significant “era shift” is underway in AI, comparable to the advent of the microprocessor or the shift to SaaS [00:02:06]. This shift is characterized by:

Bigger, smarter models [00:02:14]
Reasoning capabilities [00:02:17]
Multimodal AI [00:02:19]
“Foundation model wars” [00:02:20]
Intelligence becoming “too cheap to meter” [00:02:22]

This shift leads to rapid growth in AI products like Cursor and increased user expectations [00:02:30]. DataDog aims to leverage these advancements by moving up the stack, providing AI agents that use the DataDog platform on behalf of customers [00:02:53]. This requires work in agent development, evaluation, and new types of observability [00:03:06].

DataDog’s AI Agents in Beta

DataDog is currently developing several AI agents in private beta:

AI Software Engineer

This agent proactively observes and acts on errors, analyzes them, identifies causes, and proposes solutions [00:06:55].

Capabilities: Generates code fixes, reduces on-call incidents, and can even create recursion tests to prevent future issues [00:07:10].
Integration: Offers options to create Pull Requests in GitHub or open diffs in VS Code for editing [00:07:32]. This significantly reduces manual coding, testing, and overall human time spent [00:07:38].

AI On-Call Engineer

Designed to handle 2 AM alerts and reduce the frequency of engineer pages [00:03:46].

Workflow:
1. Kicks off proactively when an alert occurs [00:04:04].
2. Situationally orients itself by reading runbooks and grabbing alert context [00:04:07].
3. Investigates by looking through logs, metrics, and traces, acting in a loop to understand the situation [00:04:16].
4. Automatically runs investigations and provides summaries/information before a human even gets to their computer [00:04:26].
Human-AI Collaboration: A new page allows for human-AI collaboration, enabling users to verify agent actions, learn from them, and build trust [00:04:47]. Users can see the reasoning behind hypotheses, what the agent found, and the steps taken from runbooks [00:05:05].
Reasoning and Remediation: The agent develops hypotheses, reasons over them, tests ideas using tools (e.g., running queries against logs/metrics), and validates or invalidates each hypothesis [00:05:30]. If a root cause is found, it can suggest remediations like paging another team or scaling infrastructure [00:05:51]. It can also integrate with existing DataDog workflows [00:06:16].
Post-Mortem Generation: After an incident is remediated, the agent can write a post-mortem, summarizing what occurred and the actions taken by both the agent and humans [00:06:25].

Lessons Learned Building AI Agents

DataDog has identified several key learnings from developing these AI agents:

1. Scoping Tasks for Evaluation

Building quick demos is easy, but robust evaluation is harder [00:08:01].

Define “Jobs to Be Done”: Clearly understand the step-by-step human workflow and how a human would evaluate it [00:08:33].
Vertical, Task-Specific Agents: Focus on specific tasks rather than generalized agents [00:08:48].
Measurable and Verifiable: Ensure each step is measurable and verifiable, as this is a common pain point [00:08:52].
Domain Experts as Design Partners: Use domain experts for evaluation and verification, not for writing code or rules, due to the stochastic nature of AI models [00:09:10].
Eval, Eval, Eval: Deeply consider evaluation from the start. This includes offline, online, and “living” evaluation sets with end-to-end measurements [00:09:31]. Instrumenting product usage is crucial for feedback [00:10:03].

2. Building the Right Team

Optimistic Generalists: While one or two ML experts are helpful, the core team should consist of optimistic generalists who can write code effectively and adapt quickly to ambiguity [00:10:14].
UX/Frontend Importance: User experience and frontend development are crucial for effective collaboration with agents [00:10:28].
AI-Augmented Teammates: Team members should be excited about being AI-augmented, exploring new capabilities, and adapting to a rapidly changing field [00:10:38].

3. The Changing User Experience (UX)

The traditional UX patterns are evolving, and developers must be comfortable with this [00:11:03]. DataDog favors agents that function more like human teammates rather than requiring numerous new pages or buttons [00:11:28].

4. Observability Matters

Even with agents, observability is paramount and should not be an afterthought [00:11:36].

Debugging Complex Workflows: AI agent workflows are complex, requiring situational awareness for debugging [00:11:42].
LM Observability: DataDog’s “LM Observability” view helps monitor LLMs, providing a single pane of glass for diverse interactions, hosted models, and API usage [00:11:50].
Agent Graph: For multi-step, complex agent calls (potentially hundreds of calls or decisions), a specialized “agent graph” provides a human-readable view to quickly identify errors [00:12:26].

The “Bitter Lesson” and Agents as Users

A key insight, referred to as the “agent or application layer bitter lesson,” suggests that general methods leveraging new, off-the-shelf models are ultimately the most effective [00:13:16]. Fine-tuning specific models can be quickly surpassed by general advancements in foundation models [00:13:26]. The ability to easily swap and try out new models is crucial [00:13:45].

Diamond also emphasizes the future where AI agents will become users of platforms like DataDog [00:14:01]. It’s estimated that agents could surpass humans as users within the next five years [00:14:07]. Therefore, product development should consider not just human users, but also how third-party agents like Claude might directly use the platform, requiring clear API documentation and context [00:14:21].

Future Outlook for AI in DevOps

The future of AI in DevOps is expected to be “weird and fun,” with accelerating AI advancements [00:14:50].

DevSecOps Agents For Hire: DataDog aims to offer teams of DevSecOps agents for hire that can directly use their platform and handle tasks like on-call duties [00:14:56].
Agents as Customers: Companies building SRE, coding, and other types of agents will increasingly use platforms like DataDog as customers [00:15:10].
Accelerated Innovation: Small companies will be able to leverage automated developers (like Cursor or Devin) to bring ideas to life, and agents for operations and security, enabling an order of magnitude more ideas to reach the real world [00:15:25].

Tubegraph

Explorer

Table of Contents