Future trends in AI and multimodal models

From: redpointai

Logan Kilpatrick, the first AI hire at OpenAI, discussed various future trends in AI and the evolving landscape of multimodal models and their applications.

Multimodal AI: The Next Frontier

Kilpatrick stated that the “next year is the year of multimodal” AI [00:03:19]. While still in early stages for vision use cases, there’s a strong belief that the technology will significantly advance. Current vision models, akin to “GPT 3.5 Vision era,” need to achieve a more detailed understanding of the positional relationship between objects in an image to unlock a wider range of robust applications [00:03:24], similar to the jump in capabilities seen from GPT-3.5 to GPT-4 [00:04:06].

Examples of future multimodal capabilities include:

Enhanced Design Tools The ability to reformat designs in applications like Canva, where the model perfectly understands spatial relationships and allows for editable, movable objects like titles, is a desired future state [00:42:56].
Improved Document Processing Current OCR (Optical Character Recognition) on models like GPT-4V, while capable, can miss structural details in spreadsheets or receipts [00:43:09].
Generative Art and Design The ability to sketch art and have generative image models create complete pieces, allowing users to explore creative possibilities beyond their artistic skills, is seen as empowering [00:41:17]. The success of tools like TLDraw, which converts user drawings into functional apps, exemplifies the potential of these models [00:39:48].

Evolution of AI Models and Infrastructure

Kilpatrick foresees several key evolutions in AI model development and infrastructure:

Assistance API and GPTs

The Assistance API is predicted to be a significant long-term development [00:02:09]. Unlike plugins, GPTs (Generative Pre-trained Transformers) overcome security and privacy challenges, allowing seamless integration of browsing, code interpreter, and custom actions [00:26:39]. The upcoming GPT store will also address discoverability issues previously seen with plugins [00:27:19].

Custom Models

While base models will continue to improve and become more steerable, there will always be a need for custom AI models, particularly for specialized domains like legal or medical where companies possess unique, high-quality data [00:09:32]. These custom models can be more compute-efficient by removing unnecessary data from training sets [00:10:02]. The challenge is making custom model training more accessible and affordable, moving beyond the current multi-million dollar cost and requirement for billions of tokens [00:10:24].

Prompt Engineering and Communication

Kilpatrick believes that the “death of prompt engineering” is coming, as models will evolve to better understand and translate user requests, similar to how DALL-E 3 revises prompts in ChatGPT to achieve desired outcomes [00:29:01]. The core of prompt engineering is effective communication, and future models will reduce the friction for users to articulate their needs concisely [00:29:38].

The Future of Interfaces and Agents

There’s a strong desire for AI assistants to be integrated into existing workflows, such as text messaging and email, rather than requiring users to visit new websites or applications [00:30:24]. This approach minimizes the need to re-educate users on new interfaces, as seen with Microsoft’s Co-pilot strategy integrating AI into their existing products [00:34:45].

Regarding AI agents, their widespread adoption depends on significant “Internet infrastructure work” to authenticate humans versus AI agents [00:35:44]. This is critical to ensure responsible use and prevent misuse, as models could potentially bypass human verification methods [00:36:26]. Companies like Apple and Google may need to form a consortium to develop open standards for AI agent interaction [00:37:20].

Challenges and Opportunities for AI Development

Reliability and Latency

Two major limitations of current AI models for enterprise adoption are robustness/reliability and latency [00:45:08]. Enterprises often need to use third-party tools (e.g., guardrails) to ensure confidence in production environments [00:45:15]. Latency, where users wait several seconds for a response, needs significant improvement to match the speed of human thought and creative flow [00:46:11].

Observability and Evaluation

Kilpatrick highlights observability as an underhyped aspect of AI [00:48:18]. Developers need better tools to monitor and understand how AI models are being used, similar to the detailed dashboards offered by platforms like Stripe [00:18:00].

A critical unmet need in the AI space is a robust solution for model evaluation (“evals”) [00:21:52]. With rapid model iteration cycles, users struggle to determine if a new model truly improves their specific use cases without extensive human time for evaluation [00:22:43]. While difficult to automate, understanding failure points through eval processes is crucial for learning and improvement [00:23:18].

Competition and Open Source

The emergence of models like Google Gemini is viewed positively, pushing innovation across the industry [00:23:56]. While OpenAI aims for its models to consistently outperform open-source AI models due to resource intensity and engineering, open-source options offer greater control over weights and deeper customization through fine-tuning [00:15:01]. However, the accessibility and reduced operational burden of using an API (avoiding GPU allocation, setup) provide significant value for developers [00:17:15].

Advice for AI-Curious Individuals

For those overwhelmed by the rapid pace of AI, Kilpatrick advises:

Identify Pain Points Audit daily tasks or job responsibilities that are disliked or where personal improvement is desired [00:54:24].
Integrate AI into Workflow Make it a habit to consult AI tools first for tasks [00:56:05].
Developers Must Use AI Developers not using tools like ChatGPT or GitHub Copilot are at a disadvantage, as AI can significantly amplify their capabilities and creativity [00:54:52].
Seek Real-World Examples Find and share stories of how top professionals in various fields are using AI tools to inspire others [00:56:18].

Ultimately, the future of AI involves greater integration into everyday applications and workflows, driven by improvements in model capabilities and the development of accessible, reliable platforms.

Tubegraph

Explorer

Table of Contents