From: redpointai

Enterprises should prepare for an “agentic future,” where AI agents become deeply embedded into products used daily [00:01:36]. The most exciting development is the dispersion of underlying agentic models and APIs into a wider array of products across the web [00:01:13].

Current State of Multiagent Architecture

Developers are actively building multi-agent systems, often referred to as “swarms,” to address complex business problems [00:05:12]. OpenAI’s Agents SDK was released to facilitate this process, making it easier for developers to build these solutions [00:05:14], [00:05:47].

A prominent example of this architecture is in customer support automation [00:05:24]. Here, different agents might specialize:

  • One agent handles refunds [00:05:28].
  • Another manages billing and shipping information [00:05:30].
  • A third agent might decide whether to pull from an FAQ database or escalate to a human [00:05:33].

This multi-agent architecture is gaining significant popularity [00:05:40]. Splitting tasks across multiple agents simplifies the debugging process of the overall workflow [00:19:07], [00:19:15]. By allowing each agent to focus on a single task with all necessary context, their efficacy on those specific tasks increases dramatically [00:18:41], [00:18:47].

Recommendations for Enterprises

For companies and CEOs, the advice is to:

  • Begin by building AI agents internally to address immediate, real problems within the organization [00:06:09], [00:36:59].
  • Explore frontier models and computer use models [00:36:42].
  • Identify internal manual workflows that could benefit from a tool interface and start implementing it [00:37:05], [00:37:08].
  • Engage employees by asking what their least favorite day-to-day tasks are and then automate those tasks [00:38:15], [00:38:17]. This approach can boost productivity and employee satisfaction [00:38:21], [00:38:24].

Challenges and Strategies in Enterprise AI Deployment

Tool Orchestration

The most critical aspect for enterprises is mastering agent and tool orchestration [00:19:45], [00:19:47]. The current capabilities of models often surpass how they are utilized in most AI applications [00:19:55], [00:19:58]. There is substantial value to be extracted by building robust orchestration layers around these models [00:20:04], [00:20:07].

Key challenges include:

Evaluation and Fine-tuning

One of the biggest problems remaining in the AI development stack is creating reliable evaluation (eval) sets and grading mechanisms for models [00:12:48], [00:12:51]. While techniques like reinforcement fine-tuning exist, productizing the creation of good tasks and graders for specific domains remains difficult [00:12:56], [00:13:01]. For domain-specific applications (e.g., medical, legal), it is possible to build custom graders that cross-reference a model’s output with known ground truth (e.g., medical textbooks) [00:11:10], [00:11:19].

The ability to create these tasks and graders, and get the model to find the correct tool-calling path for a unique problem, is considered crucial [00:09:36], [00:09:41], [00:09:46]. This allows for training a model to “think” like an expert in a specific domain (e.g., a legal scholar or medical doctor) [00:10:09], [00:10:15]. The goal is to make this process of evaluating tasks and workflows much easier, ideally about 10 times simpler than it is today [00:35:50].

Exposure to the Public Internet

While companies are building internal agents, the question of when and how to expose these agents to the public internet for external communication remains an area of development [00:05:51], [00:05:54]. It is anticipated that this will naturally occur as it becomes clear that external communication with an agent provides a tangible benefit [00:06:15], [00:06:19].

Future Developments in AI Agent Frameworks

Evolution of Agent Capabilities

In 2024, agentic products typically involved clearly defined workflows with a limited number of tools (less than 10-12) [00:07:05], [00:07:09]. However, 2025 is shifting towards a “chain of thought” model, where the agent’s reasoning process is intelligent enough to:

This represents a significant departure from deterministic workflow building [00:07:48], [00:07:50]. The next major step is to overcome the constraint of a limited number of tools, allowing agents to access hundreds of tools and intelligently select the right one [00:08:05], [00:08:10].

Increase in Runtime and Context

Current model runtimes, like those for deep research, are in the minutes [00:09:00], but extending these to hours or days will yield more powerful results [00:09:03], [00:09:05]. This increased runtime, combined with the ability to handle a broader range of data, including data from the web (not just user-provided data), is a significant development [00:06:42], [00:06:46].

Computer Use Models

Computer use models are proving surprisingly versatile. Initially, they were envisioned for automating legacy applications lacking APIs [00:13:34], [00:13:38]. This has been successful in domains like medical tasks involving multi-application manual clicking [00:13:50], [00:13:53].

However, new use cases have emerged:

  • Google Maps Research: Companies like UniFi GTM have used it to research climate tech startups, such as checking if a company has expanded its charging network by having the agent open Google Maps, activate Street View, and navigate to locations [00:14:05], [00:14:08], [00:14:26].
  • Vision and Text Ingestion: Computer use models are well-suited for domains that don’t map to JSON or plain text on the web, requiring a combination of vision and text ingestion [00:14:57], [00:15:01], [00:15:03].
  • Cybersecurity: Startups are exploring using computer use models for cybersecurity work, like finding vulnerabilities in websites and surfaces by having the agent “poke around” [00:31:27], [00:31:29].

The environment in which computer use tools are applied is fragmenting significantly, with models being tested on iPhone screenshots and Android, indicating a vast range of future possibilities [00:30:38], [00:30:40], [00:30:47].

The Future of AI Agents in Software Development

A significant differentiator for application builders long-term will be their ability to orchestrate tools and data with multiple model calls [00:41:02], [00:41:04]. This includes:

  • Using reinforcement fine-tuning to enable models to call tools within their chain of thought [00:41:17], [00:41:19].
  • Chaining together multiple LLMs effectively [00:41:22].
  • Rapidly evaluating and improving these systems [00:41:26]. This will be the most crucial skill moving forward [00:41:28].

In the coming months, efforts will focus on:

  • Building out the tools ecosystem on top of foundational blocks [00:29:43], [00:29:45].
  • Advancing the computer use VM space, especially enabling secure and reliable deployment of virtual machines in enterprise infrastructure [00:29:57], [00:30:00].
  • Developing smaller, faster models that are highly effective at tool use and can be easily fine-tuned for specific applications [00:32:53], [00:32:56], [00:33:02].

The overall goal is to make the “flywheel” from evaluation to production to fine-tuning and back again significantly simpler and faster [00:35:01], [00:35:03], [00:35:05].