From: aidotengineer

The development of AI agents in 2025 highlights a critical shift towards omnimodal AI systems, moving beyond traditional text-based interfaces to integrate diverse forms of data and interaction [00:15:24].

Importance of Multimodality

Building AI agents with multimodality is crucial for several reasons:

  • Enhanced User Experience Multimodality allows for a 10x user personalized experience, reimagining how users interact with AI [00:15:26]. The focus moves away from the limitations of a chatbot as an interface towards more human-like interactions [00:15:34].
  • Human-like Interaction To make AI more human, it needs to incorporate “eyes and ears, nose, a voice” [00:15:45]. This includes:
    • Voice: Significant improvements in voice technology have been observed over the last year, becoming “scary good” [00:15:50].
    • Smell: Companies like Osmo are digitizing the sense of smell [00:15:54].
    • Touch: Instilling a sense of embodiment, particularly in robotics, by integrating touch [00:16:01].
    • Memories: Enabling AI to know users on a deeper, truly personal level by incorporating memories [00:16:07].
  • Visionary Product Experience: By incorporating these new modalities, the “visionary nature of the product exceeds all expectations,” even if the agent is inconsistent or unreliable, creating something new and magical [00:16:18].

Examples of Multimodal Implementation

  • Visual Canvas: Companies like Tlop are reimagining the visual canvas, implementing AI through “breast strokes” [00:16:28]. Their “he jaw computer” allows combining various AI models in tandem, making the underlying large language model invisible to the user [00:16:37].
  • Diverse Data: Multimodal systems handle not just web and text data, but also design data, image data, video data, audio data, sensor data, warehouse data, and even the data generated by the agent itself in real-time [00:10:16].

Ultimately, an innovative product experience focused on user experience and multimodality is what truly differentiates an AI system [00:17:17].