OpenAI agent development tools

From: redpointai

OpenAI has introduced new tools and APIs designed to facilitate the development and deployment of advanced AI agents, reflecting a long-term vision of embedding these capabilities across various products and web surfaces [00:00:00]. This includes addressing needs in AI infrastructure and creating opportunities for startups [00:00:18].

Vision for Agents and Computer Use Models

The long-term vision for agents is their deep integration into everyday products rather than being confined to specific interfaces like ChatGPT [01:11:03]. This means agents will automate tasks within browsers and workplaces, performing actions like clicking, filling forms, and conducting research [01:21:40]. The goal of the API platform is to disperse agentic capabilities widely, making them ubiquitous [01:46:27]. Developers, who have superior domain knowledge, are expected to leverage these models to create a wide array of specialized products [01:59:04].

An example of a highly anticipated agent is an API-designing agent, which could automate the tedious process of defining parameters [02:22:15].

Evolution of Agent Interaction

The interaction of agents with the web has significantly evolved [03:22:58]. Initially, agents would perform a single turn of action, decide whether to search the web, retrieve information, and then synthesize a response [03:27:00]. By 2025, the paradigm shifted towards “chain of thought” processes, where models continuously get information, re-evaluate their stance, retrieve more data, and even open multiple web pages in parallel to save time [03:40:00]. This multi-step reasoning with tool calling is a major shift [04:00:00]. Future developments might see web page extraction replaced by interactions with other agents via endpoints [04:15:00]. This chain of thought process, where tool calling happens seamlessly between the internet, private data, and private agents, is expected to become commonplace [04:34:00].

New Tools and Capabilities

OpenAI has released several tools to support compound AI systems and their development and agent capabilities:

Responses API

The Responses API is designed to facilitate multi-turn interactions with models, allowing a model to call itself and tools multiple times to reach a final answer [00:26:23]. It aims to combine the ease of use of chat completions with the advanced tool-use capabilities of the Assistants API [00:26:01].

Agents SDK

The Agents SDK was released to enable developers to create multi-agent architectures for solving complex business problems [05:12:00]. This architecture is popular for tasks like customer support automation, where different agents handle specific issues (e.g., refunds, billing, escalating to humans) [05:24:00]. This approach of splitting tasks among many agents simplifies debugging and improves efficacy by allowing each agent to focus on a specific task [01:10:59].

Computer Use Models

Computer use models automate tasks across applications, particularly beneficial for legacy systems without APIs [00:13:34]. Examples include automating manual medical tasks across multiple applications [00:13:50]. Unexpectedly, these models have also been used for complex research tasks, such as UniFi GTM using them to research charging networks on Google Maps, including Street View, despite Google Maps having an API [00:14:02]. This highlights their utility in domains where data doesn’t map to JSON or requires a combination of vision and text ingestion [00:14:57]. The potential for automating “anything” is significant [00:14:48]. The Arc browser’s “DIA” feature, which allows users to open a tab and give an instruction for the browser to perform tasks in the background, is a cool example of native integration [00:16:54].

File Search

File search is highlighted as an easy-to-use tool where developers can upload documents and integrate them with models using a vector store ID [00:22:16]. This capability allows models to search over user-provided data, which found significant market fit [00:25:21].

Challenges and Opportunities in AI Agent Development

Evaluation and Grading

A significant challenge remains in productizing effective grading and task generation for domain-specific AI models [00:12:40]. While methods like reinforcement fine-tuning with custom tasks and graders exist, making this process easy and accessible for all developers is a major hurdle [00:09:35]. The ability to “steer” a model’s chain of thought by teaching it how to approach specific domains (e.g., legal, medical) is a powerful concept [00:10:06]. OpenAI provides basic building blocks for graders, allowing developers to cross-reference model outputs with ground truth or execute code for mathematical correctness [00:11:40].

Tool Constraint and Runtime

In 2024, agentic products were typically limited to well-defined workflows with fewer than 10 tools [00:07:02]. The next major unlock is to remove this constraint, allowing agents to access and figure out the right tools from hundreds of options [00:08:05]. Another challenge is extending model runtime from minutes to hours or even days, which is expected to yield more powerful results [00:08:49].

API Design Philosophy: “APIs as Ladders”

OpenAI’s API design follows the “APIs as ladders” principle [00:21:52]. This means providing significant power out of the box for simple tasks, while also offering granular control and customization options as developers need more complexity [00:22:05]. An example is File Search, which is easy to use with defaults but allows users to tweak parameters like chunk size, metadata filtering, and re-rankers for more advanced use cases [00:22:32]. The aim is to make the quick start simple (e.g., four lines of curl code) while exposing many optional parameters [00:23:10]. Future “knobs” could include site filtering and granular location settings for web search [00:23:30]. Additionally, the Responses API allows users to opt into features like conversation storage (equivalent to the Assistants API’s threads object) and model configuration storage (assistant-type object) [00:24:00].

Strategic Advice for Developers and Enterprises

“The models are much further than where most AI applications are like making use of things.” [00:19:52] “Building things around models to make them work really well is an extremely important thing that AI startups should be doing and products should be doing.” [00:20:07]

The biggest differentiator for application builders long-term will be their ability to orchestrate tools, data, and multiple model calls [00:41:02]. This involves quickly chaining together LLMs, evaluating performance, and iterating [00:41:22].

For enterprises and CEOs, the recommendation is to:

Start exploring frontier models and computer use models [00:36:40].
Identify internal workflows for automation using multi-agent architectures [00:36:49].
Figure out which manual workflows require a tool interface [00:37:05].
Focus on finding ways to automate applications and ensure programmatic access to tools [00:37:51].
Ask employees about their least favorite tasks and automate them to boost productivity [00:38:15].

Future Outlook and Underexplored Applications

Model Progress

Model progress is expected to be even greater than last year due to feedback loops where models help improve themselves with better data [00:33:33]. There is a strong desire for smaller, faster models that are good at tool use, acting as “workhorse” or “supporting” models for quick classifications and guardrailing [00:32:53]. These smaller models are also more fine-tunable for specific use cases [00:33:24].

Areas of Focus

Tools Ecosystem: Building a robust tools ecosystem on top of the foundational Responses API [00:29:42].
Computer Use VM Space: Maturing the enterprise adoption of AI agents in secure, reliable virtual machine deployments, with observations and monitoring [00:29:57]. The community is expected to address fragmentation in environments like iPhone VMs, Android, and various Ubuntu flavors [00:30:53].
Eval Process: Making the evaluation process for tasks and workflows significantly easier, as it is currently very challenging [00:35:50].
Specific Model Capabilities: Improvements in models generating clean “diffs” for code changes are highly anticipated [00:33:35].

Underexplored Applications

Scientific Research: Expecting a step change in the speed of scientific research [00:41:41]. Finding the right AI interface for academia will be key to driving adoption [00:42:07].
Robotics: A major breakthrough is expected in robotics [00:42:19].
Travel Industry: Despite being a popular demo, a functional AI travel agent product is still missing [00:43:03].

Over/Underhyped Areas

AI agents are considered both overhyped due to multiple hype cycles and underhyped because companies that successfully implement them achieve significant gains [00:38:50].

Further Resources

For more information on OpenAI’s APIs and tools, visit:

OpenAI Docs: platform.openai.com/docs [00:43:47]
OpenAI Developers Twitter/X account [00:43:56]
OpenAI Community Forum: community.openai.com [00:44:00]

Tubegraph

Explorer

Table of Contents