From: redpointai
The future of AI agents points towards their deep embedding into daily products and business operations, moving beyond standalone applications like ChatGPT [01:36:00]. This shift is driven by the release of underlying models and APIs that allow for wider dispersion of agentic capabilities across the web [01:13:00].
Vision for Agent Interaction
In the next 5-10 years, consumers are expected to interact with agents not just in dedicated surfaces like ChatGPT, but also within existing browsers or through automated tasks at work [01:04:00]. These agents will automate tasks like clicking, filling out forms, and conducting research, becoming seamlessly embedded into everyday tools [01:31:00]. The API platform aims to disperse this technology to make it ubiquitous [01:44:00].
Building for an Agentic Future
Enterprises should actively build towards this agentic future by creating internal AI agents to solve real business problems [06:07:00].
Multi-Agent Architectures
Developers are already creating multi-agent swarms to address complex business problems, particularly in areas like customer support automation [05:15:00]. For example, in customer support, different agents can specialize in refunds, billing, shipping, or escalate to human assistance [05:24:00]. The Agents SDK was released to facilitate the building of these multi-agent architectures [05:40:00].
Splitting tasks among multiple agents simplifies debugging and increases efficacy, as each agent can focus on a specific task with dedicated context [18:30:00]. This prevents the need to prompt engineer a single agent for a multitude of functions, reducing the “blast radius” of changes during development [18:49:00].
Internal vs. External Exposure
While internal use cases are immediate, exposing these agents to the public internet for communication is a future development that makes significant sense [05:51:00].
Challenges in Enterprise AI Deployment
- Grading and Task Generation: Productizing effective grading and task generation remains a significant challenge for enterprises, making it difficult for almost anyone to utilize these tools [12:48:00].
- Orchestration Complexity: Making models work effectively requires significant effort in orchestration, meticulous trace observation, prompt engineering, and maintaining eval sets to prevent degradation [20:44:00]. This process is currently very difficult [20:57:00].
- Tool Constraints: Currently, models have a constraint on the number of tools they can effectively manage (typically less than 10-15) [08:07:00]. The next “unlock” is enabling agents to intelligently use hundreds of tools [08:10:00].
- Runtime Limitations: While agent runtimes are extending to minutes (e.g., Deep Research), extending them to hours or days will yield more powerful results [09:02:00].
Recommendations for Enterprise Leadership
For CEOs of enterprises contemplating the agentic future, the recommendation is to:
- Start Exploring: Begin exploring frontier models and computer use models [36:42:00].
- Automate Internal Workflows: Identify and automate internal manual workflows using multi-agent architectures [36:49:00].
- Identify Manual Workflows for Tool Interfaces: Determine which manual workflows require a tool interface to become programmatic [37:05:00].
- Prioritize Employee Pain Points: Ask employees about their least favorite daily tasks and seek ways to automate them to increase productivity and satisfaction [38:15:00].
Advancements in AI Agents and Computer Use Models
Evolution of Agent Behavior
The approach to agent interaction has evolved from single-turn web searches to multi-turn reasoning processes [03:30:00]. Modern agents can get information, reconsider their stance, retrieve more data, and even open multiple web pages in parallel, utilizing a “chain of thought tool calling” mechanism [03:44:00]. This means the model figures out how to call multiple tools, and even backtrack or change its path if needed [07:32:00].
Reinforcement fine-tuning allows developers to define tasks and graders, enabling models to find the optimal tool-calling path for specific problems unique to their domain [09:35:00]. This process teaches the model how to “think” about a domain, similar to how university training shapes human thought [10:06:00].
Computer Use Models
Computer use models have shown surprising versatility:
- Legacy Applications: Automating tasks in legacy applications without APIs, common in domains like medical where complex manual clicks across multiple applications are prevalent [13:34:00].
- Novel Research: Examples include agents using Google Maps and Street View to research climate tech startups’ charging network expansion, despite Google Maps having an API [14:02:00]. This highlights the ability to automate complex visual and textual information processing where traditional APIs might be insufficient or too complex [15:03:00].
- Cybersecurity: Exploring vulnerabilities in websites and services [31:25:00].
The models work best in browser environments, but people are experimenting with them in diverse environments like iPhone screenshots and Android [30:37:00].
Future of AI Tools and Infrastructure
- API Design Agent: A desired agent is one that can design APIs, leveraging deep research into best practices and fine-tuning on preferred APIs [02:22:00].
- Tool Ecosystem: There is a need to build a robust tool ecosystem on top of foundational APIs like the responses API, which focuses on multi-turn model interactions and tool calls [29:42:00].
- Verticalized AI Infrastructure: Opportunities exist for specialized AI infrastructure companies providing VMs tailored for specific verticals (e.g., coding AI startups requiring rapid VM spin-up/spin-down) [28:23:00].
- LLM Ops Companies: Companies helping developers manage prompts, billing, and usage across multiple models and providers (e.g., OpenRouter) are valuable [28:51:00].
Differentiators for Application Builders
The biggest differentiator for application builders in the long term will be their ability to orchestrate tools and data with multiple model calls, evaluate results quickly, and continuously improve their applications [41:02:00]. This includes leveraging reinforcement fine-tuning for tools within a chain of thought or chaining together multiple LLMs [41:17:00].
Under-explored Applications
Areas with significant untapped potential include:
- Scientific Research: Expecting a step change in the speed of scientific discovery [41:41:00].
- Robotics: Poised for major advancements driven by these models [42:19:00].
- Travel Industry: The creation of a truly effective AI travel agent remains an under-explored, though frequently cited, application [42:54:00].
Outlook
Model progress in the coming year is expected to be even greater than last year, driven by a feedback loop where models teach developers how to improve them with better data [42:33:00]. The focus is on making the evaluation to production to fine-tuning loop much simpler and faster [35:01:00].