Case study of AI application in documentation

From: aidotengineer

Elmer Thomas, Principal Developer Educator at Twilio, presents a case study on how their small documentation team leverages AI to enhance workflows rather than replace human workers [00:00:00], [00:00:19]. The approach aims to embrace AI to “work smarter” [00:00:15].

Initial Pain Points in Documentation [00:01:05]

The tiny documentation team faced a significant influx of Jira tickets, leading to three primary challenges:

Error-prone first drafts from over 100 product teams [00:01:12].
Time-consuming grooming tasks, such as style checks, alt text generation, and SEO optimization [00:01:17].
Hallucination risk if AI was left unchecked [00:01:21].

The team sought “leverage, not burnout” to address these issues [00:01:26].

AI Agent Architecture and Implementation Strategy [00:01:28]

Instead of a single “megabot,” Twilio’s docs team built six single-purpose AI agents operating behind a simple Next.js frontend [00:01:28], [00:01:31], [00:02:31]. This strategy falls under strategies for effective AI implementation and AI in workflow automation.

Specific AI Agents Built [00:01:31]

Each agent is designed to tackle a repetitive, well-scoped job, allowing humans to focus on judgment and clarity [00:02:10].

Automated Editor: Fixes grammar, formatting, and accuracy [00:01:37].
Image Alt Text Generator: Provides instant accessibility wins [00:01:41].
Jargon Simplifier: Translates developer terminology into plain English [00:01:48].
SEO Metadata Generator: Creates title and description metadata while adhering to character limits [00:01:53].
Docs Outline Builder (Coming Soon): Recommends navigation and structure [00:01:58].
Slack Backbot: Helps triage help channel requests [00:02:05].

Sweet Spot for AI Tasks [00:02:22]

The rule of thumb for selecting tasks for an AI helper is to pick those that are:

Repeatable [00:02:19]
High volume [00:02:19]
Low creativity [00:02:22]

Technical Flow and Guardrails [00:02:28]

The process for each request incorporates several layers of oversight:

Next.js UI: Front-end interface [00:02:31].
Custom GPT-4o Agent: Utilizes the appropriate model for the job [00:02:37]. The custom GPT includes Twilio’s style guide and rubric, retrieved from an AirTable for easy collaboration [00:02:44].
Validation Layer: Includes Veil linting and CI/CD tests [00:02:56].
GitHub Pull Request (PR): Adds codeowner review, making it easier to scrutinize agent-suggested changes [00:03:03].
Human Merge: A human only merges the changes when they are correct, typically after product and engineering reviews [00:03:12]. This layered approach significantly reduces hallucinations [00:03:27].

Live Demo Overview [00:03:40]

Maria Bermudez, Lead Developer of the AI Docs Buddy, demonstrated the agents [00:03:44]. The system can autogenerate overview and release pages using Copilot [00:03:55].

Automated Editor [00:04:02]

This agent allows users to load an MDX or Markdown file, or plug in a live URL [00:04:04]. It uses the GPT-4o model, chosen for its consistency in applying Twilio’s style guide and rubric [00:04:19]. The output shows a diff of changes and a detailed list of explanations, including original text, revised text, and the specific style guide/rubric items addressed [00:04:34]. It’s acknowledged that it’s “not perfect,” sometimes missing things like SEO descriptions [00:04:51].

SEO Metadata Generator [00:05:03]

This agent generates a meta title (which can be toggled off if not desired) and a meta description, accounting for character limitations [00:05:41].

Image Alt Text Generator [00:05:53]

Similar to the editor, this agent can take a live URL or allow users to select multiple pages to generate alt text for all images. It produces alt text quickly and conforms to the required format of Twilio’s docs platform [00:06:15].

Jargon Simplifier [00:06:28]

Useful for both writing and reviewing pull requests, this agent simplifies complex text [00:06:33]. It provides a diff-like editor and a revised text tab for quick copying and application as a pull request comment or direct file edit [00:06:56].

The team is currently working on enabling agents to communicate with each other [00:07:13].

Guardrails and Risk Mitigation [00:07:23]

Effective AI implementation requires robust guardrails to ensure quality, addressing risks such as:

Hallucinations: Mitigated using tools like Veil Lint and CI tests, combined with multiple human stakeholders reviewing [00:07:29].
Bias: Addressed through dataset tests and prompt audits [00:07:40].
Stakeholder Misalignment: Managed through weekly PR reviews (sometimes compressed to days or hours) and Slack feedback loops, especially with product managers and engineering teams [00:07:46]. These feedback cycles allow for continuous prompt tuning [00:08:03].

Playbook for AI Implementation [00:08:11]

Twilio shares a three-step playbook for teams looking to implement AI:

Identify one pain that significantly impacts throughput [00:08:14].
Pick a single task that is repeatable and rule-based [00:08:17].
Loop with your users weekly, at minimum [00:08:22].

The process should be “Ship, measure, and refine” [00:08:27]. Stacking these small wins can significantly boost a team’s velocity [00:08:30]. This serves as a practical example of AI implementation and outlines steps to create effective evaluations for AI applications.

Tubegraph

Explorer

Table of Contents