From: aidotengineer

Kent C. Dodds teaches how to build excellent user experiences, initially for the web, and now with AI through his course platform, epicai.pro [00:00:00]. The focus is on how user interaction is changing, enabled by technologies like Model Context Protocol (MCP), to reach users where they want to be: inside AI assistants [00:24:00].

The Vision: Jarvis as the Ideal AI Assistant

The ideal user experience is exemplified by Tony Stark’s AI assistant, Jarvis [01:25:00]. Jarvis can perform a wide range of actions that would take a human much longer, if not be impossible, to do without AI assistance [03:03:00].

Jarvis’s Capabilities:

  • Compiles databases from various sources (e.g., SHIELD, FBI, CIA intercepts) [01:52:00].
  • Generates UIs on demand [03:11:00].
  • Accesses public records [03:11:00].
  • Brings up thermogenic signatures [03:11:00].
  • Joins across different datasets to filter information [03:22:00].
  • Shows related news articles [03:22:00].
  • Creates flight plans [03:22:00].
  • Interacts with physical environment (e.g., answering doorbell, displaying visitor info) [03:22:00].
  • Offers multi-modal interaction: talking, typing, gestures, and dynamic UI generation [04:53:00].

While generating UIs is already possible [04:15:00], creating databases from sensitive government sources or developing holographic interfaces are still areas of advancement [04:03:00]. The core question is why we don’t each have a Jarvis already, given that the technology for many of these capabilities exists [04:21:00].

The Challenge: Integrations

The primary obstacle to a widespread Jarvis-like AI assistant is the difficulty of building comprehensive integrations [05:15:00]. Users desire a single robot that can interface with everything, from well-known services to niche local government websites [05:34:00]. Building individual integrations for every conceivable service is impractical for large AI companies like Google or OpenAI [05:43:00]. Without universal capability, the incentive to invest in wiring up partial solutions diminishes [05:56:00].

Users want an AI assistant that can interface with a service they’ve never used before and may never use again, without requiring manual website navigation [06:24:00]. They want their Large Language Model (LLM) to “figure that out” [06:31:00].

The Solution: Model Context Protocol (MCP)

Model Context Protocol (MCP) aims to provide a standard mechanism for AI assistants to communicate with various tools and services [06:33:00].

History and Evolution of AI Interaction

The evolution of AI interaction can be categorized into three phases:

Phase 1: ChatGPT and LLM Host Applications

Around three years ago, ChatGPT’s release marked a pivotal moment, not primarily because of the LLM itself (which had existed for a while), but due to the host application layer that provided a good user experience for interfacing with an LLM [07:21:00]. This application layer drove significant investment and rapid improvement in LLMs [07:48:00].

  • Capabilities: Answered questions [07:06:00].
  • Limitations: Users had to manually provide context (copy-pasting code, text, or images) and manually apply results [07:57:00]. The LLM couldn’t perform actions or manage context automatically [08:27:00].

Phase 2: LLMs with Built-in Integrations

In this phase, the host application became more sophisticated, integrating with the LLM to provide context and enable actions [08:35:00]. Examples include search engines, calendar integrations, and Slack summarizers [08:48:00].

  • Capabilities: LLMs could “do stuff” beyond just answering questions [09:04:00].
  • Limitations: Capabilities were limited by the developer’s time at companies like OpenAI or Anthropic to build integrations [09:24:00]. Proprietary plugin systems (like OpenAI’s GPT plugin system) required building specific solutions for each platform, lacking a universal standard [09:51:00]. Users don’t want multiple LLM “wrappers” with specific tools; they desire one universal Jarvis [10:27:00].

Phase 3: Model Context Protocol (MCP)

MCP represents the next leap, enabling AI assistants to “do anything” [11:01:00]. It is designed as a standard protocol that various AI assistants will support, allowing developers to build to the MCP specification and be usable by any compliant AI assistant [11:05:00].

  • Capabilities: Enables universal integration and capability for AI assistants. It promises to bring us “just one really good user experience application away from Jarvis for everybody” [11:31:00].

MCP Architecture

The MCP architecture facilitates seamless communication and capability expansion:

  1. Host Application to LLM: The host application communicates with the LLM, informing it about available services, which can be dynamically added or removed [11:42:00]. The LLM uses this context, along with the user’s query, to select the most appropriate tool [12:01:00].
  2. Host Application Client: The host application creates a standard client for each service it wants to interface with. This client uses a standard interface, avoiding special integrations [12:09:09].
  3. Service Provider MCP Servers: Service providers create MCP servers that interface with their specific tools, resources, and prompts. This unique part is controlled by the service provider, while the server-client communication remains standard [12:25:00].

This standardized communication is what gives “Jarvis hands” to actually perform tasks [12:52:00].

MCP in Action (Demo)

A demo showcased MCP servers in action, configured with a Cloud Desktop host application [13:20:00]. While current clients may require user approval for tool calls (unlike Jarvis, who has built trust with Tony), the potential is clear [14:05:00].

Example Scenario:

  • Prompt: “Please write a journal entry for me… derive my location and weather conditions from my device location and make up a creative story.” [13:34:00]
  • MCP Server Interactions:
    • Locationator: Determines current location [13:55:00].
    • Get Weather: Retrieves weather conditions for given coordinates [14:32:00].
    • EpicMe (Journaling Server):
      • Authentication: Uses OAuth 2.1 for secure login [14:43:00].
      • Create Journal Entry: The LLM writes the entry, and the server configures inputs [15:35:00].
      • Tag Management: Checks and creates relevant tags (e.g., “travel”) for the entry [16:12:00].
      • Retrieval and Formatting: The server can retrieve entries and format them in a sensible way (e.g., Markdown), which the client can then display dynamically [16:47:00].
      • Language Translation: The LLM can translate server responses (e.g., English to Japanese) if the user is interacting in a different language [17:41:00].
      • Deletion and Logout: Demonstrates full functionality including authenticated actions [17:53:00].

This transition means users will soon no longer need to use browsers or meticulously phrase search queries [18:17:00]. Instead, they can speak naturally, and the AI will understand their intent (“what you’re actually trying to do”) and perform the action [18:42:00]. This is what MCP enables and makes the future of AI in improving user experience and integrations so exciting [18:54:00].

Resources