Building singlepurpose AI agents

From: aidotengineer

Elmer Thomas, Principal Developer Educator at Twilio, presented at the AI Engineer World’s Fair on how their small documentation team leverages AI agents to enhance workflows rather than replace staff [00:00:19]. The talk, titled “The Robots Are Coming for Your Job, and That’s Okay,” aims to demonstrate why AI should be embraced for smarter work [00:00:11].

Identifying Pain Points in Documentation Workflows

The Twilio docs team faced significant challenges with a high volume of Jira tickets [00:01:09]. Three main issues were identified [00:01:12]:

Error-prone first drafts: Over 100 product teams submitted drafts, often leading to errors [00:01:12].
Time-consuming grooming: Tasks like style checks, alt text generation, and SEO optimization were time sinks [00:01:17].
Hallucination risk: Allowing large language models (LLMs) to operate unchecked posed a risk of generating incorrect information [00:01:22].

To combat burnout and improve efficiency, the team sought leverage [00:01:26].

Architecture: Six Single-Purpose AI Agents

Instead of developing one large, monolithic bot, Twilio built six single-purpose AI agents accessible via a Next.js frontend [00:01:28]. Each agent is designed to tackle a specific, repetitive, and well-scoped job, allowing human team members to focus on judgment and clarity [00:02:10].

The rule of thumb for selecting tasks for AI helpers is to pick those that are repeatable, high-volume, and low-creativity [00:02:16].

The agents developed include:

Automated Editor: Fixes grammar, formatting, and accuracy in documentation [00:01:37].
Image Alt Text Generator: Provides instant accessibility by generating alt text for images [00:01:43].
Jargon Simplifier: Translates technical developer language into plain English [00:01:48].
SEO Metadata Generator: Creates title and description metadata, ensuring character count compliance [00:01:53].
Docs Outline Builder: Recommends navigation and document structure (coming soon) [00:01:58].
Slack Backbot: Assists in triaging help channel requests [00:02:05].

Agent Workflow and Human Oversight

The general workflow for each request involves a Next.js UI feeding into a custom GPT-4o agent [00:02:31]. The appropriate model is chosen for each specific job [00:02:39].

Key aspects of the architecture include:

Custom GPT: Incorporates Twilio’s style guide and rubric, which are retrieved from an Airtable for easy collaboration [00:02:44].
Validation Layer: Includes Veil linting and CI/CD tests [00:02:56].
GitHub PRs: Codeowner reviews are integrated, making it easier to scrutinize agent-suggested changes [00:03:03].
Human Approval: A human only merges changes when they are correct, often after product and engineering reviews [00:03:12].

This layered approach significantly reduces hallucinations without slowing down the process [00:03:29].

Live Agent Walkthrough

Maria Bermudez, a lead developer of the AI Docs Buddy, demonstrated the agents [00:03:44].

Automated Editor: Allows users to load MDX files or plug in a live URL [00:04:04]. It uses the GPT-4o model for consistent application of the style guide and rubric [00:04:19]. The tool shows a diff of changes and provides a detailed explanation of original text, revised text, and the specific style guide/rubric items that triggered the changes [00:04:34].
SEO Metadata Generator: Can generate metatitles and meta descriptions, accounting for character limitations [00:05:33].
Alt Text Generator: Offers the option to plug in a live URL or process multiple pages simultaneously, quickly generating alt text in the required format [00:05:55].
Jargon Simplifier: Helps simplify complex technical text, useful for writing and reviewing pull requests [00:06:28]. It provides a diff and a revised text tab for easy copying and application of edits [00:06:56].

The team is currently working on enabling agents to communicate with each other [00:07:13].

Guard Rails: Mitigating Risks in AI Development

To ensure quality and mitigate risks, Twilio implemented several guard rails [00:07:23]:

Hallucinations: Mitigated using Veil Lint and CI tests, combined with multiple human stakeholders’ reviews [00:07:29].
Bias: Addressed through dataset tests and prompt audits [00:07:40].
Stakeholder Misalignment: Prevented via weekly (or more frequent) PR reviews and Slack feedback loops with product managers and engineering teams [00:07:46].

These feedback cycles enable continuous prompt tuning, rather than relying on the model to remain perfect [00:08:03].

Playbook for Building AI Agents

The Twilio team suggests a three-step playbook for other teams to adopt when developing and optimizing AI agents [00:08:11]:

Identify one pain point that significantly impacts throughput [00:08:14].
Pick a single task that is repeatable and rule-based [00:08:17].
Loop with users weekly (at least): Ship, measure, and refine [00:08:22].

By stacking small wins, teams can significantly increase their velocity [00:08:29].

Tubegraph

Explorer

Table of Contents