From: aidotengineer
The future of user interaction is shifting towards AI assistance, driven by advancements like the Model Context Protocol (MCP) [00:24:29]. This transition aims to bring users into AI assistance environments, fundamentally changing how product developers and others reach their audience [00:41:21].
The Jarvis Ideal: A Vision for AI User Experience
Tony Stark’s AI assistant, Jarvis, from the Iron Man movies, serves as a powerful example of an ideal AI-driven user experience [00:25:28].
Jarvis’s capabilities include:
- Compiling databases from disparate sources like SHIELD, FBI, and CIA intercepts [01:52:00], though this is currently technically difficult for reasons other than technical limitations [04:02:00].
- Generating user interfaces (UIs) on demand [03:13:00].
- Accessing public records [03:14:00].
- Retrieving thermogenic signatures and performing complex data joins across different datasets [03:17:00].
- Showing related news articles [03:28:00].
- Creating flight plans [03:31:00].
- Interfacing with home systems, such as answering the doorbell [03:31:00].
- Utilizing various interaction methods beyond voice, including typing, gestures, and dynamic UI generation [04:53:00].
While many of these capabilities are technologically possible today, such as generating UIs [04:15:00], the challenge lies in creating a unified experience where an AI assistant can interface with virtually everything [05:36:00].
The Integration Challenge
The primary obstacle to achieving a ubiquitous AI assistant like Jarvis is the complexity of integrations [05:15:00]. It is extremely difficult to build integrations for every possible service and application [05:17:00]. For example, major AI developers like Google, OpenAI, or Anthropic are unlikely to build specific integrations for niche services, such as a local city government’s park pavilion reservation website [05:43:00]. Users desire one central AI assistant that can augment itself with any capability in the world [10:28:00].
Evolution of AI Assistance and Integrations
Phase 1: LLMs as Question Answering Systems
Around three years ago, the release of ChatGPT marked a pivotal moment [06:57:00]. What made ChatGPT revolutionary was not just the underlying Large Language Model (LLM), which had existed for a long time, but the host application layer that provided an excellent user experience for interacting with an LLM [07:33:00]. Initially, users had to manually provide context by copying and pasting text or images into the LLM and then manually extracting results [07:57:00].
Phase 2: Host Application-Enabled Actions
In this phase, the host application gained the ability to tell the LLM what services were available and to retrieve more context if needed [08:35:00]. This enabled the LLM to perform actions beyond answering questions, such as using search engines, scheduling meetings via calendar integrations, or summarizing messages with Slack integrations [08:48:00]. However, this approach is limited by the time developers at companies like OpenAI or Anthropic can dedicate to building specific integrations [09:24:00]. Proprietary plugin systems, like OpenAI’s GPT plugin system, further fragment the integration landscape, requiring developers to build specific solutions for each platform [09:51:00].
Phase 3: Model Context Protocol (MCP)
MCP represents a paradigm shift by offering a standard protocol that all AI assistants can support [10:58:00]. This standardization allows developers to build to the MCP specification, ensuring their services are usable by any compliant AI assistant [11:12:00].
Key architectural aspects of MCP:
- The host application communicates with the LLM and dynamically manages the available services [11:42:00].
- The LLM selects the most appropriate tool based on the user’s query and available services [12:03:00].
- The host application creates a standard client for each service integration [12:09:00].
- Service providers create MCP servers that interface with unique tools, resources, and prompts [12:22:00].
- The communication between the server and client is standardized, allowing service providers to control the unique aspects of their service while maintaining interoperability [12:49:00].
- MCP servers can incorporate robust authentication, such as OAuth 2.1, making them as secure as other applications using the standard [15:19:00].
This standardization provides AI assistants the “hands” to perform real-world actions [12:55:00]. While full client readiness for dynamic UI display is still developing, there is no technical barrier to achieving it [17:11:00].
A demonstration using MCP servers showed how an AI assistant could:
- Determine current location using a “locationator” server [13:55:00].
- Check weather conditions via a “get weather” server [14:32:00].
- Authenticate with a journaling server (EpicMe) using an O token [14:43:00].
- Create and retrieve journal entries, with the LLM formatting the output in a user-friendly way (e.g., Markdown) [15:35:00].
- Perform actions like deleting posts and logging out, all while handling authentication seamlessly [17:53:00].
A key distinction from Jarvis is the current need for human oversight to approve tool calls due to a lack of established trust and capability [14:05:00].
The Future of User Interaction
The future of user interaction will move away from traditional browser-based experiences and manual search queries [18:17:00]. Users will be able to speak their questions naturally, and the AI will not only understand what they are trying to search for but also what they are trying to do, and then execute that action for them [18:42:00]. This shift signifies a move toward more intuitive and action-oriented AI applications.
Resources
- Model Context Protocol Specification: A comprehensive document for understanding MCP [19:04:00].
- EpicAI.pro: A platform by Kent C. Dodds offering courses, posts, workshops, and cohorts on AI in general and user interaction [19:09:00].