From: aidotengineer

Kent C. Dodds, a developer focused on building excellent user experiences, notes a significant shift from web-based interactions to AI-driven ones as users migrate to AI assistants [00:00:05]. This transition is fundamentally changing how users interact with technology, driven by innovations like Model Context Protocol (MCP) [00:00:27].

The Vision: Jarvis and Advanced AI Assistants

The ideal AI interaction is exemplified by Tony Stark’s AI assistant, Jarvis, from the Iron Man movies [00:01:25]. Jarvis performs a wide range of complex tasks that significantly augment Tony Stark’s capabilities, making research and analysis much faster [00:03:03].

Jarvis’s Capabilities

Jarvis demonstrates advanced interaction and utility, including:

  • Compiling databases from disparate sources (SHIELD, FBI, CIA intercepts) [00:01:52].
  • Generating UIs on demand for data visualization (virtual crime scene reconstruction, thermogenic signatures) [00:02:02], [00:02:26].
  • Accessing public records [00:02:21].
  • Joining across different datasets for complex queries (e.g., filtering thermogenic occurrences by Mandarin attack locations) [00:02:38], [00:02:47].
  • Showing related news articles [00:03:28].
  • Creating flight plans [00:02:50].
  • Interacting via voice, typing, and gestures [00:04:52].
  • Generating dynamic UIs and interacting with them [00:04:58].

Current Technological Parallels and Gaps

While some Jarvis capabilities, like generating UI on demand, are achievable with current technology [00:04:15], obstacles remain. Creating databases from restricted sources like government intelligence agencies is not technically impossible but legally and access-wise difficult [00:04:03]. Holographic displays are still in development [00:04:11]. The core question is why a personal “Jarvis” isn’t ubiquitous yet [00:04:21].

The Challenge: Building Integrations

The primary barrier to widespread, omni-capable AI assistants like Jarvis is the immense difficulty of building integrations for every conceivable service and application [00:05:15]. Companies like Google, OpenAI, or Anthropic will not build integrations for every local or niche service (e.g., a city government website for reserving park pavilions) [00:05:43]. Without comprehensive integration, users are less incentivized to invest in setting up an AI assistant that only performs some tasks [00:05:56]. Users desire a single AI assistant that can interface with everything, even services they use infrequently [00:06:20].

The Solution: Model Context Protocol (MCP)

The history and evolution of AI interfaces, particularly with large language models (LLMs), can be understood in three phases, culminating in Model Context Protocol (MCP). MCP provides a standard mechanism for AI assistants to communicate with various tools and services [00:06:33], aiming to overcome the integration challenge.

Phase 1: ChatGPT and Manual Context

Around three years ago, the release of ChatGPT was pivotal, not just for its LLM (which had existed for some time), but for the host application layer that provided a good user experience for interacting with an LLM [00:06:58], [00:07:33]. This led to significant investment and rapid improvement in LLMs [00:07:48].

However, a key limitation was the need for manual context provision [00:07:57]. Users had to copy-paste code or text into the AI and then manually extract and apply results [00:08:00]. While LLMs could answer questions, they couldn’t “do” anything or manage context themselves, making it a “pain” [00:08:27], [00:08:31].

Phase 2: Host Application Integrations

In the second phase, host applications began to integrate LLMs with external tools like search engines, calendars, or Slack [00:08:35]. The host application could tell the LLM what services were available and retrieve necessary context or execute actions [00:08:41]. This allowed the AI to “do stuff” [00:09:04].

The problem persisted: these integrations were limited by the development time of companies like OpenAI or Anthropic [00:09:24]. Proprietary plugin systems (e.g., OpenAI’s GPT plug-in system) further fragmented efforts, as developers would need to build separate integrations for each platform (OpenAI, Anthropic, Google) [00:09:51]. Users don’t want a multitude of “LLM wrappers”; they want a single, versatile AI assistant [00:10:28].

Phase 3: Model Context Protocol (MCP)

MCP represents the third phase, providing a standard protocol that AI assistants can support [00:10:58], [00:11:05]. This means developers can build to the MCP specification and their services become usable by any compliant AI assistant [00:11:12].

MCP Architecture and Collaboration between human engineers and AI

The architecture involves:

  • Host Application: Communicates with the LLM and dynamically informs it about available services [00:11:42].
  • LLM: Processes user queries and selects the most appropriate tool based on available services [00:12:01].
  • MCP Client: Created by the host application for each service, adhering to a standard interface [00:12:09].
  • MCP Servers: Created by service providers, interfacing with their unique tools, resources, and prompts [00:12:25].

The standardization of communication between the client and server is what makes MCP powerful, allowing service providers to maintain control over their unique features while being universally accessible [00:12:49]. This standardization gives AI “hands” to perform actions across different systems [00:12:55].

MCP in Action: A Journaling Demo

A demonstration showcases MCP’s potential, despite current limitations in client readiness [00:13:01].

Workflow Example

The demo involves asking an AI assistant in a Cloud Desktop environment (configured with three MCP servers) to write a journal entry [00:13:22]. The AI uses several MCP servers to fulfill the request:

  1. Locationator: Determines current device location, requiring user approval for tool calls due to current trust limitations [00:13:55].
  2. Get Weather: Retrieves weather conditions for the derived location [00:14:32].
  3. EpicMe (Journaling Server):
    • Authenticates the user via OAUTH 2.1, providing enterprise-level security [00:14:43], [00:15:19].
    • Allows the LLM to write a creative journal entry based on location and weather [00:15:37].
    • Manages journal entry tags, creating new ones (e.g., “travel”) as needed [00:16:12].
    • Enables retrieval of the journal entry, where the LLM can format the JSON response into a readable Markdown display [00:16:40], [00:16:51].

Implications for User Experience

  • Dynamic UI: While current clients don’t fully support dynamic UI like cards, MCP enables the possibility for rich, context-aware displays [00:17:11].
  • Multimodal AI and the Future of Human Interaction: LLMs can translate responses between languages (e.g., English server response to Japanese user query), enhancing accessibility [00:17:34].
  • Natural Interaction: MCP facilitates a shift away from traditional browser-based interaction and keyword-specific searching [00:18:15]. Instead, users can simply speak their questions, and the AI can understand intent and execute actions [00:18:42]. This represents a return to more natural human-computer interaction [00:18:42].

The vision is for users to eventually stop needing to “Google” or phrase questions precisely, instead communicating naturally with an AI that can understand and perform tasks for them [00:18:20], [00:18:50].

Conclusion

MCP holds the promise of a future where a universal “Jarvis” becomes a reality for everyone, enabling AI assistants to integrate seamlessly with any service [00:11:29]. This represents an exciting future of AI in improving user experience and integrations and building user experiences with AI that prioritizes natural, intuitive interaction over complex manual processes [00:19:25].

Resources

  • Model Context Protocol Specification [00:19:04]
  • EpicAI.pro - for learning about MCP and AI in general [00:19:12].