Multimodal AI and the Future of Human Interaction

From: aidotengineer

The year 2025 marks a “perfect storm for AI agents” [00:03:41], driven by reasoning models outperforming human ability, increased test-time compute, advanced engineering, cheaper inference and hardware, and massive infrastructure investments globally [00:04:48]. An AI agent is defined as a fully autonomous system where Large Language Models (LLMs) direct their own actions [00:05:14].

Current State and Challenges

Despite the rapid advancements in AI, AI agents are not yet fully functional as anticipated [00:05:00]. This is often due to “tiny cumulative errors” that accumulate, rather than outright hallucinations [00:06:54]. These errors can include:

Decision Errors: Choosing the wrong fact, such as booking a flight to “San Francisco Peru” instead of “San Francisco California” [00:07:10].
Implementation Errors: Incorrect access or integration, leading to issues like being locked out of a database [00:07:26].
Heuristic Errors: Applying the wrong criteria, such as failing to account for rush hour traffic when booking a flight [00:07:44].
Taste Errors: Misjudging personal preferences, like booking a flight on a specific aircraft type the user dislikes [00:08:03].

The “Perfection Paradox” arises when users get frustrated with AI agents that perform at human speed or are inconsistent, despite their magical capabilities [00:08:22]. Even highly accurate agents can show significant performance disparities over many steps, making complex tasks unreliable [00:09:00].

Strategies for Improving AI Interaction

To optimize AI agents and improve user interaction, several best practices are emerging:

Data Curation

Ensuring AI agents have access to clean, structured data is crucial [00:10:09]. This includes proprietary data, data generated by the agent itself, and data used for quality control in the model workflow [00:10:32]. Building an “agent data flywheel” ensures that every user interaction improves the product in real-time [00:10:49].

Evaluations (Evals)

Measuring a model’s response and determining the correct answer is critical [00:11:22]. While straightforward in verifiable domains (like math), it’s challenging for non-verifiable systems where human preferences and subjective signals need to be collected and understood [00:11:47].

Scaffolding Systems

Implementing infrastructure logic to prevent cascading errors when an applied AI feature fails is essential [00:12:45]. This can involve building complex compound systems or bringing a human back into the loop for reasoning [00:13:08]. The goal is to develop stronger agents that can self-heal, correct their own path, or break execution when unsure [00:13:20].

User Experience (UX)

UX is paramount in making AI agents better co-pilots [00:13:47]. As foundation models become commodities, the ability to reimagine product experiences, deeply understand user workflows, and promote “beautiful elegant human machine collaboration” [00:14:07] differentiates companies [00:14:47]. This includes features like asking clarifying questions, predicting next steps, and seamlessly integrating with legacy systems [00:14:13]. The focus should be on leveraging proprietary data and deep user workflow knowledge in fields like robotics, hardware, defense, manufacturing, and life sciences to create magical end-user experiences [00:14:55].

Building Multimodally

The future of AI interfaces and user interaction lies in “multimodal” experiences beyond just text-based chatbots [00:15:26]. Incorporating new modalities can create a 10x more personalized user experience [00:15:28]. This means making AI more human by adding “eyes and ears nose a voice” [00:15:45].

Voice: Significant improvements in voice technology are making it “pretty scary good” [00:15:50].
Smell: Companies are digitizing the sense of smell [00:15:54].
Touch: Instilling a more human feeling and sense of embodiment through robotics [00:16:01].
Memories: Enabling AI to become truly personal and know the user on a much deeper level [00:16:07].

By developing visionary products that exceed expectations through multimodal interaction, the perceived inconsistency of AI agents becomes less of a hindrance [00:16:18]. The goal is to create seamless experiences where users might not even realize they are interacting with a large language model in the background [00:16:40].

Ultimately, the future of AI in improving user experience and integrations depends on thinking bigger, leveraging multimodality, and designing innovative product experiences that truly set the workflow and vision apart [00:17:17].

Tubegraph

Explorer

Table of Contents