Criteria for building agents

From: aidotengineer

Barry, a speaker from an AI engineer summit, discusses key insights for building effective AI agents, drawing from a blog post titled “Building Effective Agents” co-written with Eric [00:00:28]. The article shares opinionated takes on what an agent is and isn’t, along with practical learnings [00:00:35]. This discussion focuses on three core ideas: not building agents for everything, keeping it simple, and thinking like your agents [00:00:52].

The Evolution to Agentic Systems

The journey of AI development has progressed from simple features like summarization, classification, and extraction, which were once considered magic but are now commonplace [00:01:07]. As products matured, developers moved to orchestrating multiple model calls in predefined control flows, known as workflows, trading cost and latency for better performance [00:01:19]. This marked the beginning of agentic systems [00:01:37].

Today, models are more capable, leading to the rise of domain-specific agents in production [00:01:42]. Unlike workflows, agents can determine their own trajectory and operate almost independently based on environment feedback [00:01:52]. The broad trend is that as these systems gain more agency, they become more useful and capable, but also incur higher costs, latency, and greater consequences for errors [00:02:17].

Core Ideas for Building Agents

1. Don’t Build Agents for Everything

Agents are best suited for scaling complex and valuable tasks, not as a universal upgrade for every use case [00:02:31]. Workflows are often a more concrete and effective way to deliver value today [00:02:47].

Agent Building Checklist:

Complexity of Your Task [00:02:59]: Agents thrive in ambiguous problem spaces [00:03:04]. If the decision tree for a task can be easily mapped out, it’s more cost-effective and provides more control to build that explicitly rather than using an agent [00:03:08].
Value of Your Task [00:03:21]: The exploration involved in agentic behavior consumes many tokens, so the task must justify the cost [00:03:23]. For high-volume customer support systems with a budget of around 10 cents per task (30-50,000 tokens), workflows for common scenarios are more practical [00:03:31].
Derisk Critical Capabilities [00:04:00]: Ensure there are no significant bottlenecks in the agent’s trajectory [00:04:04]. For a coding agent, this means ensuring it can write good code, debug, and recover from errors [00:04:09]. Bottlenecks can multiply costs and latency, suggesting a need to reduce scope or simplify the task [00:04:18].
Cost of Error and Error Discovery [00:04:32]: If errors are high-stakes and difficult to discover, it becomes challenging to trust the agent with autonomous actions [00:04:34]. Mitigation strategies include limiting scope (e.g., read-only access) or incorporating more human-in-the-loop, though these can limit scalability [00:04:44].

Example: Coding as a Great Agent Use Case

Coding is an ideal agent use case because:

Going from a design document to a pull request is a complex and ambiguous task [00:05:03].
Good code has significant value [00:05:11].
Models like Claude are already proficient at many parts of the coding workflow [00:05:16].
Coding output is easily verifiable through unit tests and CI [00:05:23].

2. Keep it Simple

Once a good use case for agents is found, the second core idea is to keep the design as simple as possible [00:05:41]. Agents are fundamentally models using tools in a loop [00:05:51].

Agent Components:

Environment: The system in which the agent operates [00:06:02].
Tools: Provide an interface for the agent to take action and receive feedback [00:06:06].
System Prompt: Defines the agent’s goals, constraints, and ideal behavior within the environment [00:06:11].

Complexity introduced upfront significantly hinders iteration speed [00:06:27]. Iterating on these three basic components offers the highest ROI [00:06:31]. Many different agent use cases and applications for browser agents, such as coding agents, computer use agents, and search agents, share nearly identical backbones and codebases, differing mainly in environment, tools, and prompts [00:06:41].

Optimizations, such as caching trajectories to reduce cost in coding, parallelizing tool calls in search to reduce latency, or presenting agent progress to build user trust, should come after the basic behaviors are established [00:07:33].

3. Think Like Your Agents

A common pitfall for builders is developing agents from their own human perspectives, leading to confusion when agents make mistakes [00:08:06]. It is recommended to put oneself in the agent’s context window [00:08:20].

Even though agents can exhibit sophisticated behavior, at each step, the model is performing inference on a very limited set of contexts, typically 10 to 20k tokens, which represents everything it knows about the current state of the world [00:08:27]. Limiting one’s own understanding to this context helps determine if it’s sufficient and coherent [00:08:43].

For example, a computer use agent receives only a static screenshot and a poorly written description, then attempts actions without full visual feedback. This “lethal phase” is like closing one’s eyes for 3-5 seconds while using a computer [00:09:02]. Experiencing a task from this limited perspective reveals crucial missing information, such as screen resolution for clicking or recommended actions to avoid unnecessary exploration [00:10:04].

Models like Claude can also assist in evaluating the performance of browser agents and improving agents [00:10:41]:

Asking if system prompt instructions are ambiguous or make sense to the model [00:10:47].
Checking if the agent understands how to use a tool based on its description, or if it needs more/fewer parameters [00:10:52].
Analyzing the agent’s entire trajectory with the model to understand decision-making and identify areas for improvement [00:11:02].

Future Considerations in Agent Engineering

Discussions around best practices for building AI agents and strategies for developing and implementing browser agents often touch on future challenges:

Budget-Aware Agents: Making agents more aware of cost and latency is crucial for production deployment [00:11:47]. Defining and enforcing budgets (time, money, tokens) is an open question [00:12:02].
Self-Evolving Tools: Agents could design and improve their own tool ergonomics, becoming more general-purpose by adopting tools needed for specific use cases [00:12:13].
Multi-Agent Collaboration: There is a strong conviction that multi-agent collaborations will become prevalent in production [00:12:38]. They offer benefits like parallelization, separation of concerns, and protection of the main agent’s context window [00:12:46]. A key open question is how these agents will communicate beyond rigid synchronous user-assistant interactions, enabling asynchronous communication and diverse roles [00:12:59].

Summary

To summarize the criteria for building effective agents:

Don’t build agents for everything: Choose use cases wisely based on complexity, value, critical capabilities, and error costs [00:13:41].
Keep it as simple as possible: Focus on iterating with the core components – environment, tools, and system prompt – before optimizing [00:13:46].
Think like your agent: Gain their perspective by understanding their limited context window to improve their decision-making and job performance [00:13:51].

Tubegraph

Explorer

Table of Contents