From: redpointai
A recent discussion on Unsupervised Learning covered a wide range of topics pertinent to software developers and enterprises navigating the evolving landscape of AI agents and infrastructure [00:00:03]. Key themes included understanding agent capabilities, building for an “agentic future,” differentiating application builders, and identifying needs in AI infrastructure [00:00:12].
The Agentic Future
The long-term vision for AI agents involves their deep embedding into daily products, moving beyond dedicated surfaces like ChatGPT [00:01:04]. The goal is for agents to become ubiquitous, automating tasks and performing research across the web [00:01:20].
Evolution of Agent Interactions
Agent interaction with the web has evolved significantly from single-turn searches to multi-step reasoning processes [00:03:27]. Products like Deep Research demonstrate models that can retrieve information, rethink their approach, and open multiple web pages in parallel to save time [00:03:44]. This “chain of thought” tool calling represents a major shift in how agents access and process information [00:04:00]. It is anticipated that web page extraction might be replaced by other agents, seamlessly embedded in a chain of thought process involving both the internet and private data/agents [00:04:15].
Building for Enterprises
Enterprises are already building multi-agent architectures to solve business problems, particularly in areas like customer support automation [00:05:15]. This often involves swarms of agents, each handling specific tasks (e.g., refunds, billing, escalating to a human) [00:05:28]. The Agents SDK was released to simplify the development of these multi-agent systems [00:05:47]. Companies are advised to build AI agents internally to address real problems before considering exposing them to the public internet [00:06:07].
Where Agents Work Today
In 2024, agentic products typically involved clearly defined workflows with a limited number of tools (less than 10-12) [00:07:02]. Examples included coding agents, customer support automation, and deep research projects [00:07:21]. In 2025, the shift is towards models performing complex reasoning within a chain of thought, enabling them to call multiple tools, backtrack, and try different paths without deterministic workflows [00:07:32].
Challenges with Many Tools and Runtime
A significant future unlock for agents is removing the constraint on the number of tools they can access, allowing them to intelligently select from hundreds of tools [00:08:05]. Additionally, increasing the available runtime for models from minutes to hours or days will yield more powerful results, as humans can work on tasks for extended periods using many tools [00:08:47].
Developing and Fine-tuning Agents
Reinforcement Fine-tuning and Domain Specificity
Reinforcement fine-tuning, involving tasks and graders, is crucial for developers to teach models to find the correct tool-calling paths for domain-specific problems [00:09:35]. This process steers the model’s chain of thought, essentially training it to think in the way a legal scholar or medical doctor would, leading to significant verticalization for these models [00:10:06].
Challenges in Grading and Evaluation
Providing effective tooling for grading and evaluation in domain-specific contexts like legal or healthcare remains a challenge [00:10:46]. While basic building blocks for flexible graders exist (e.g., cross-referencing model output with ground truth or executing code for mathematical correctness), productizing these tools for broad use is a major problem to be solved [00:11:00]. The difficulty lies in creating robust evaluations that go beyond simple string matching and capture complex domain performance [00:11:50].
Computer Use Models and Unexpected Applications
Computer use models have shown surprising versatility [00:13:27]. Initially thought for automating legacy applications without APIs (e.g., manual clicks across multiple medical apps) [00:13:36], they have also found applications in areas like Google Maps research, including using Street View to check for changes in charging networks [00:14:02]. These models are well-suited for domains that don’t map to plain text JSON, requiring a combination of vision and text ingestion [00:14:57]. Startups like Browser Base and Scrappy Bar are providing services for hosting virtual machines to make computer use models work effectively [00:16:20].
Strategic Approaches for Developers
Balancing Immediate Needs with Future Model Capabilities
Developers often build scaffolding around current model capabilities to get products to market, even if future models might obviate those steps [00:19:16]. The current models are often more capable than how most AI applications are leveraging them, making the orchestration of agents and tools paramount [00:19:52]. While waiting for models to improve might simplify workflows, actively building things around models to make them work well is critical for AI startups and products [00:20:07].
Importance of Orchestration and Debugging
Meticulous orchestration, examining traces, effective prompt engineering, and maintaining eval sets to prevent prompt degradation are challenging but essential skills for developers [00:20:44]. Splitting tasks among multiple agents simplifies debugging, as changes to individual agents have a smaller “blast radius” compared to modifying a single, highly capable model with many instructions [00:21:08].
API Design Philosophy: APIs as Ladders
The design of APIs aims to provide significant power out-of-the-box, making simple tasks easy, while also offering deeper customization options for developers who want to invest more effort [00:22:05]. For example, file search is easy to use with defaults, but allows tweaking parameters like chunk size, metadata filtering, and re-ranker customization for more advanced use cases [00:22:16]. The goal is a quick start (e.g., four lines of curl code) with many optional parameters for fine-tuning [00:23:10].
Future API Customization
Future “knobs” for APIs include site filtering and granular location settings for web search [00:23:32]. The Responses API aims to incorporate features from the previous Assistance API (like storing conversations and model configurations) as opt-in parameters, reducing the initial complexity for new users [00:24:00].
Learnings from Previous APIs
The Assistance API’s success was in tool use, especially file search, demonstrating market fit for bringing custom data to models [00:25:16]. However, it was criticized for being too difficult to use, with no easy way to opt out of context storage [00:25:31]. The Responses API combines the multi-output and tool-use capabilities of the Assistance API with the ease of use of chat completions, allowing users to provide their own context on each turn [00:25:59].
The AI Infrastructure Landscape
The Responses API focuses on making multi-turn interactions with models effective, providing a foundation for models to call themselves and tools multiple times to reach a final answer [00:26:23].
OpenAI’s Role vs. Standalone Companies
OpenAI is building out-of-the-box tools in response to user demand for a “one-stop shop” for LLM functionalities like data search and internet search [00:27:41]. However, standalone AI infrastructure companies continue to build powerful, low-level, and infinitely flexible APIs, serving a large market [00:28:01]. There’s also a growing space for vertical-specific AI infrastructure (e.g., VMs for coding AI startups) and LLM operations companies that manage prompts, billing, and usage across multiple models and providers [00:28:23].
Key Challenges for Developers
Major remaining challenges for developers include:
- Tools Ecosystem: Building a robust tools ecosystem on top of foundational APIs [00:29:36].
- Computer Use VM Space: Secure and reliable deployment of virtual machines for computer use models within enterprise infrastructure [00:29:57]. The fragmentation across different environments (browsers, iPhone screenshots, Android, Ubuntu flavors) presents a significant challenge for the community to address [00:31:17].
Underexplored Applications
One particularly underexplored application area is scientific research [00:41:41]. The expectation was that AI models would revolutionize scientific discovery, but the interfaces are not yet optimal for academia [00:42:04]. Robotics is another area ripe for significant advancements with AI [00:42:19].
Recommendations for Businesses
For enterprise or consumer CEOs, the advice is to immediately start exploring frontier models and computer use models [00:36:40]. Experiment with internal workflows to build multi-agent architectures that automate tasks end-to-end [00:36:52]. Identify manual workflows that could benefit from a tool interface and work towards programmatic access [00:37:05]. This mirrors the “digital transformation” trend from the cloud era, focusing on automating applications [00:37:17]. Employees should be encouraged to identify their least favorite daily tasks to automate, as this will increase productivity and satisfaction [00:38:15].
Overhyped/Underhyped AI Aspects
Agents are considered both overhyped and underhyped [00:38:52]. They have gone through multiple hype cycles, but companies that successfully implement them to automate manual tasks or create deep research capabilities achieve significant results [00:39:03].
The power of reasoning models combined with tool use has been a significant mind-shift, enabling a move from deterministic workflows to fully agentic products that deliver powerful results [00:39:23]. Similarly, the power of fine-tuning to inject custom information into models and significantly move the needle for specific tasks is impressive [00:40:13].
The biggest differentiator for application builders long-term will be the ability to orchestrate tools, data, and multiple model calls effectively, rapidly evaluating and improving performance [00:41:02].
Model progress is expected to be more significant this year than last, driven by a feedback loop where models help researchers improve them with better data [00:42:33].
The AI travel agent remains a highly anticipated yet elusive product, despite being a common demo [00:42:57]. The travel industry is deeply entrenched, waiting for someone to truly “crack” the AI travel agent [00:42:58].