From: aidotengineer
Barry discusses how to build effective agents and the evolution of agentic systems and workflows [00:00:25]. This presentation builds on a blog post titled “Building Effective Agents” co-written with Eric [00:00:31].
Three Core Ideas for Building Effective Agents
The presentation focuses on three key ideas:
- Don’t build agents for everything [00:00:52].
- Keep it simple [00:00:55].
- Think like your agents [00:00:58].
Evolution of AI Systems
The progression of AI system development can be viewed in phases:
- Simple Features Initially, developers built simple features like summarization, classification, and extraction, which were considered “magic” a few years ago but are now commonplace [00:01:07].
- Workflows As products matured, more sophisticated approaches emerged, involving the orchestration of multiple model calls in predefined control flows [00:01:22]. These workflows allowed for a trade-off between cost and latency for better performance [00:01:32]. This is considered the beginning of agentic systems [00:01:39].
- Agents Current models are more capable, leading to the emergence of domain-specific agents in production [00:01:44]. Unlike workflows, agents can determine their own trajectory and operate almost independently based on environmental feedback [00:01:52].
- Future Phases Future agentic systems may involve single, more general-purpose agents or collaboration and delegation in multi-agent settings [00:02:04]. The broad trend is that as systems are given more agency, they become more useful and capable, but also increase costs, latency, and the consequences of errors [00:02:17].
When to Build an Agent
Agents are best suited for scaling complex and valuable tasks, not every use case [00:02:38]. Workflows remain a great way to deliver value [00:02:49].
Checklist for Building an Agent
Before building an agent, consider the following:
- Complexity of the Task Agents excel in ambiguous problem spaces [00:03:01]. If the decision tree is easily mapped, it’s more cost-effective to build it explicitly and optimize each node [00:03:08].
- Value of the Task The exploratory nature of agents consumes many tokens, so the task must justify the cost [00:03:21]. For high-volume, low-budget tasks (e.g., customer support), a workflow for common scenarios might be more appropriate [00:03:31].
- Critical Capabilities De-risk essential capabilities to avoid bottlenecks that multiply cost and latency [00:04:02]. For a coding agent, this means ensuring it can write, debug, and recover from errors [00:04:09]. If bottlenecks exist, reduce the scope and simplify the task [00:04:24].
- Cost of Error and Error Discovery High-stakes and hard-to-discover errors make it difficult to trust the agent with autonomy [00:04:32]. Mitigations like read-only access or human-in-the-loop involvement limit scalability [00:04:46].
Why Coding is a Great Use Case for Agents
Coding is an excellent use case for agents because [00:05:00]:
- The task of going from a design document to a pull request is highly ambiguous and complex [00:05:03].
- Good code has significant value [00:05:13].
- Models like Claude are proficient in many parts of the coding workflow [00:05:16].
- Code output is easily verifiable through unit tests and Continuous Integration (CI) [00:05:25].
Keeping Agents Simple
Agents fundamentally consist of models using tools in a loop [00:05:51].
Core Components of an Agent
Three components define an agent [00:05:57]:
- Environment The system in which the agent operates [00:06:02].
- Tools An interface for the agent to take action and receive feedback [00:06:06].
- System Prompt Defines the goals, constraints, and ideal behavior for the agent within its environment [00:06:11].
Keeping these components simple is crucial for iteration speed [00:06:27]. Iterating on these three basic components offers the highest ROI, with optimizations coming later [00:06:34]. Many different agent use cases share nearly identical backbones and code [00:06:44]. The primary design decisions are the set of tools provided and the prompt instructing the agent [00:07:04].
Optimizations
Once the basic components are established, various optimizations can be applied [00:07:33]:
- Cost Reduction For coding and computer use, caching trajectories can reduce costs [00:07:37].
- Latency Reduction For search, parallelizing tool calls can reduce latency [00:07:41].
- User Trust Presenting the agent’s progress transparently helps gain user trust [00:07:47].
Thinking Like Your Agent
It’s common for developers to misunderstand agent behavior because they develop from their own perspective [00:08:06]. To bridge this gap, developers should put themselves in the agent’s context window [00:08:20].
Agent’s Limited Context
Despite sophisticated behavior, an agent at each step is only running inference on a very limited set of context, typically 10-20k tokens [00:08:32]. Limiting one’s own understanding to this context helps determine if it’s sufficient and coherent [00:08:43].
For example, a computer use agent might only receive a static screenshot and a poorly written description, then attempt an action without truly “seeing” the environment [00:09:04]. During inference or tool execution, the agent is effectively “blind,” only receiving a new screenshot after the action is complete [00:09:30]. This “blind phase” can be very impactful [00:09:48]. Performing a task from this limited perspective reveals crucial missing context, such as screen resolution, recommended actions, and limitations to avoid unnecessary exploration [00:10:08].
Using Models to Understand Agents
Since these systems speak human language, models can be used to gain insights into an agent’s perspective [00:10:41]:
- Ask the model if instructions in the system prompt are ambiguous [00:10:47].
- Query the model about its understanding and usage of tool descriptions [00:10:52].
- Provide the agent’s entire trajectory and ask it to explain its decisions or suggest ways to improve [00:11:02]. This method helps gain a closer perspective on how the agent perceives the world [00:11:17].
Future Considerations for Agentic Systems
Several open questions and evolving trends are at the forefront of AI agent development:
- Budget-Aware Agents Unlike workflows, there’s less control over the cost and latency of agents [00:11:47]. Defining and enforcing budgets (time, money, tokens) is necessary for production deployment [00:11:56].
- Self-Evolving Tools Models are already used to iterate on tool descriptions, but this could generalize to a meta-tool where agents design and improve their own tool ergonomics [00:12:14]. This would make agents more general-purpose, as they could adapt tools as needed [00:12:30].
- Multi-Agent Collaborations It’s anticipated that multi-agent collaborations will be seen in production soon [00:12:40]. These setups offer benefits like parallelization, clear separation of concerns, and protection of main agent context windows through sub-agents [00:12:46]. A key challenge is how these agents will communicate, moving beyond rigid synchronous user-assistant interactions to asynchronous communication and enabling more diverse roles [00:12:59].
Key Takeaways
To summarize the approach to developing AI agents and agentic workflows:
- Don’t build agents for everything [00:13:41].
- If building an agent, keep it as simple as possible for as long as possible [00:13:46].
- As you iterate, think like your agent to understand its perspective and help it perform its job [00:13:51].