The future potential and development of AI assistance APIs

From: redpointai

The realm of AI is rapidly evolving, with a particular focus on the development and integration of AI assistance APIs, which are poised to profoundly impact how users interact with technology and how developers build applications [00:53:53].

Evolution from Plugins to Assistants API and GPTs

OpenAI’s journey into AI assistance began with plugins, which had an ambitious goal but faced limitations [00:25:08]. Plugins were initially framed as a product release, but in hindsight, they were more of a “research preview” [00:25:18]. The core idea involved a publicly hosted AI plugin manifest file that described a model’s desired actions using an OpenAPI spec [00:25:30]. However, resource constraints within OpenAI meant that teams had to quickly move on to other products like browsing and code interpreter, leaving plugins underdeveloped [00:26:09].

Key challenges with plugins included security, privacy, and issues with taking consequential actions, as well as needing explicit user consent and authentication challenges [00:26:32]. Many of these issues have since been addressed with the introduction of GPTs [00:26:39].

The Assistants API is considered a significant long-term development [00:02:10]. It removes the burden of dealing with “nitty-gritty things” for developers, although some still prefer full customization via embeddings API, LlamaIndex, or LangChain [00:02:31]. The ability to integrate tools like code interpreter directly into the API is a major advantage, enabling developers to build sophisticated products [00:03:02].

GPTs offer a much improved interface compared to plugins [00:26:55]. They allow combinations of features like browsing, code interpreter, and custom actions (which are essentially plugins) to work seamlessly out of the box [00:27:00]. The upcoming GPT store is expected to resolve many discoverability issues that plagued the plugin store [00:27:19].

Currently, most use cases for GPTs revolve around sharing prompts [00:27:50]. This highlights the continued value of prompt engineering, where well-crafted instructions can lead to significant value [00:27:57]. However, integrating custom actions still presents friction due to the need for understanding OpenAPI specs [00:28:28].

Current and Future Use Cases for AI Assistants

A highly anticipated development for AI assistants is a text-first experience, integrated with platforms like Twilio or email [00:30:24]. This would allow the AI assistant experience to be brought to many surface areas where users already work, rather than requiring them to visit a dedicated website or app [00:30:42]. The goal is for AI to assist users without requiring them to leave their existing workflows [00:30:51].

This vision includes multiplayer experiences, such as a group chat with a human and an AI assistant, where users can ask for help directly within their familiar communication channels [00:32:58]. The ability to pass around thread IDs between different objects further enhances this potential [00:31:02].

There’s a tension between having a centralized consumer front-end like ChatGPT (where users can do “everything”) and embedding AI experiences within other applications [00:33:32]. For OpenAI to achieve widespread success akin to Google, it needs to be present where customers and users already are, rather than solely relying on users initiating interactions on their own platform [00:34:11]. Microsoft’s Copilot strategy, which integrates GPT-4 into existing Microsoft customer bases, is an example of this approach [00:34:45].

Impact on the Internet and AI Agents

The widespread deployment of autonomous AI agents on the internet raises significant concerns regarding internet infrastructure [00:35:30]. There’s a current lack of safeguards to authenticate humans versus AI agents on the web [00:35:58]. The transition to more capable agents needs to happen gradually to allow people time to adapt [00:36:20].

There’s a risk of models being used to bypass human verification processes on sites [00:36:29]. Developers of websites may need different access pathways for AI agents, as forcing agents to jump through the same hoops as humans is inefficient [00:36:40]. This systemic shift will likely take years, possibly requiring a consortium of major internet companies like Apple and Google to build open standards for how AI tools interact with the web [00:37:07].

The challenges and opportunities in AI agent development also extend to ensuring responsible use [00:38:08]. Beyond safety, there are significant engineering and product experience challenges to overcome, as internet interactions are often unpredictable even for humans [00:38:13]. Despite the recent hype cycle around agents (e.g., AutoGPT, Baby AGI), it has positively forced people to confront and think about these problems, potentially slowing down premature pushes that could lead to misuse [00:39:01].

Future Development Areas for AI and Efficiency Optimizations

One area of active development is multimodal AI, especially vision use cases [00:03:21]. While current vision models are good, a “GPT-4 level” leap is needed for many truly impactful applications [00:04:06]. This requires models to have a very detailed understanding of positional relationships between objects in an image [00:03:40]. Examples include:

Perfectly understanding spatial relationships in design tools like Canva [00:42:58].
Accurate OCR (Optical Character Recognition) for spreadsheets or receipts, which currently misses structure and gets positions wrong sometimes [00:43:09].
Making generative image outputs (like DALL-E) easily editable, especially for text, which is currently challenging to fix without specialized skills [00:43:47]. The TL Draw application, which converts user sketches into functional apps using vision models, exemplifies the desired application of this technology [00:40:00].

The main objections preventing wider adoption of LLMs by enterprises and developers are robustness and reliability [00:45:08]. Currently, developers often need to build intricate orchestration frameworks or use third-party tools like Guardrails AI to ensure production-level confidence [00:45:13]. The hope is that platform providers will “upstream” solve these problems, although users may remain accustomed to adding extra safeguards [00:45:45].

Another critical challenge and advancement in AI technology is latency [00:46:11]. Many use cases cannot tolerate users waiting 7 seconds for a response [00:46:18]. Reducing latency requires continued work in model development and inference [00:46:25]. AI models, while sometimes likened to “a clone of human thought,” do not yet operate at the speed of thought, which can be jarring for users [00:47:03]. However, thoughtful UX design patterns, such as streaming responses one word at a time, can mitigate this [00:47:56].

Advice for Developers and AI Curious Individuals

For developers, using AI tools like ChatGPT and GitHub Copilot is becoming “table stakes” [00:55:08]. These tools amplify capabilities, allowing average developers to potentially outperform even the best unassisted developers [00:55:13]. The technology offers the freedom to build almost anything imaginable [00:55:20].

For those who are “AI curious” but feel overwhelmed, a good starting point is to audit daily tasks or job functions that are disliked or passions that one wishes to be better at [00:54:24]. Identifying specific problems that AI could solve is key [00:57:07]. Integrating AI into daily workflows as a habit is crucial for understanding its long-term impact on life and career [00:56:01]. Companies like Apple and Google, with their vast user bases and familiar user experiences (e.g., Siri becoming more useful), are expected to play a significant role in demonstrating the possibilities of this technology to a wider consumer audience [00:57:22]. The release of Google Gemini is also expected to show many consumers what’s possible with this technology [00:57:52].

Key Takeaways:

Overhyped: Prompt engineering [00:48:15]. While useful now, its fundamental nature as communication will hopefully evolve into more seamless interactions where models handle the translation [00:29:01].
Underhyped: Observability in AI development [00:48:18]. Understanding what’s happening within the models is crucial for effective use and development [00:48:20].
Surprising Success: Function calling [00:49:40]. It has proven to enable most really interesting production use cases [00:49:51].

To learn more about OpenAI’s API offerings, visit platform.openai.com [00:58:13].

Tubegraph

Explorer

Table of Contents