Building and Improving AI Agents

From: aidotengineer

Sierra is a conversational AI platform designed for businesses [00:00:30]. While initially known for chat and customer service, Sierra is expanding its offerings to include phone interactions, sales, subscription management, and product recommendations [00:00:53]. The company focuses on building and improving AI agents as a core product, recognizing that every agent is a product that requires a robust development and operations platform [00:09:06].

Historical Context of AI Development

The speaker reflects on the rapid pace of AI development, noting how recent milestones (2019-2023) are often perceived as “ancient history” [00:01:45]. His own experience dates back to 2016, a period he refers to as the “AI caves” [00:01:55].

In 2016, working at Google, the focus was on computer vision tasks like distinguishing between Chihuahuas and blueberry muffins, or dogs and bagels [00:02:15]. This work laid the foundation for Google Lens, which in its infancy was primarily good at identifying plants [00:03:06]. Initial testing of these early models often felt like a “slot machine” due to inconsistent accuracy [00:03:48].

Evolution of Google Lens

Today, Google Lens has evolved significantly, allowing users to:

Search and shop what they see on various platforms like Google Images and YouTube [00:04:11].
Translate non-Latin character sets [00:04:16].
Solve math homework [00:04:26].
Still identify flowers [00:04:38].

This progress is attributed to consistent, step-by-step iteration over a decade, emphasizing the need for a process similar to the software development life cycle to continuously improve without regressing [00:04:43].

Software Eating the World

The presentation references Mark Andreessen’s 2012 essay, “Software is eating the world,” to set the stage for the growth of software-driven businesses [00:06:04]. An example of a successful software-centric startup from that era is Chubbies, known for its online presence and customer-focused approach [00:06:42].

Partnering with AI Agents: The Chubbies Example

Chubbies recognized the growing need for businesses to have an AI agent to represent them and assist customers by 2025 [00:07:31]. They partnered with Sierra to create their AI agent, “Duncan Smothers” [00:07:44].

Duncan Smothers, available on the Chubbies website, is designed to be highly capable and engaging, handling various customer inquiries:

Sizing and Fit: Empathetically helps customers with sizing questions and offers product recommendations [00:08:11].
Inventory Tracking: Informs customers about stock availability and helps them choose new items [00:08:27].
Package Tracking and Refunds: Provides multiple tracking numbers for orders and can issue refunds [00:08:37].

These examples demonstrate autonomous actions taken by the agent, leading to improved customer support: more customers helped, more quickly, and with higher satisfaction [00:08:49].

Sierra’s approach involves dedicated agent engineering and product management teams who work closely with customers like Chubbies to ensure the best results [00:09:32].

The Agent Development Life Cycle (ADLC)

Sierra has developed its own process, the Agent Development Life Cycle (ADLC), for building and improving AI agents [00:12:12]. While it borrows concepts from the traditional software development life cycle, it addresses the unique challenges in developing AI agents with Large Language Models (LLMs).

Challenges with LLMs

Traditional software is deterministic, fast, cheap, rigid, and governed by strict logic. In contrast, LLMs can be:

Non-deterministic [00:11:51].
Slow [00:11:53].
Expensive to run [00:11:53].
Flexible, creative, and capable of reasoning [00:11:55].

The ADLC is designed to leverage the strengths of LLMs while integrating traditional software where beneficial [00:12:02].

A key aspect of ADLC is iterative refinement with customers in production [00:13:35]. This includes:

Sierra’s Experience Manager: Allows customers to review every conversation, monitor agent performance in real-time, and provide feedback [00:12:49].
Issue Reporting and Testing: If an issue arises (e.g., incorrect inventory information), it leads to an issue being filed, a test being created, and a new release once the test passes [00:13:15]. This ensures that agents progressively accumulate more tests, from a handful at launch to hundreds and thousands over time [00:13:27].
“Delight Budget”: Agents are empowered to go “above and beyond” for customers, such as arranging for products to be delivered from a retail location if unavailable online [00:13:42].

Initially, these processes were manual, but with advancements in AI, Sierra is increasingly able to apply AI to each part of the ADLC, accelerating improvements [00:14:00].

Scalability and External Factors

The ADLC becomes more effective with larger customer bases, especially those handling tens of millions of requests [00:14:26]. Changes impacting the ADLC come from various sources:

Agent performance issues [00:14:45].
Model upgrades [00:14:54].
New paradigms like reasoning models [00:14:57].
Multimodality [00:14:59].

Reasoning models act as a “force multiplier,” enabling more effective application of AI to development, testing, and QA steps within the ADLC [00:15:06].

Building for Voice AI Agents

Sierra launched its voice capabilities in October, with large customers like SiriusXM benefiting from the ability to answer customer calls immediately [00:15:31].

Sierra conceptualizes voice capabilities similarly to responsive web design: under the hood, it’s the same agent code and platform, but it’s “responsive” to the channel (e.g., chat, phone) and modality (e.g., text, voice) of interaction [00:16:13]. Customization for layout, phrasing, and parallelized requests for lower latency are still possible [00:16:29].

Empathy in AI Design

Building reliable AI agents with LLMs is complex because LLMs, in their unpredictability, slowness, and mathematical limitations, remind us of ourselves [00:16:51]. This offers an opportunity for designers to develop empathy, allowing them to put themselves in the “shoes of the robot” to build better experiences [00:17:03]. The goal is to create robust AI agents that can process complex inputs and experiences similar to humans [00:18:01].

Tubegraph

Explorer

Table of Contents