From: aidotengineer

Kent C. Dodds teaches how to build excellent user experiences, initially for the web and now shifting to AI, as users are moving towards AI platforms [00:00:07]. His course platform, epicai.pro, focuses on this transition [00:00:16]. The core focus is on how user interaction is changing, facilitated by Model Context Protocol (MCP), and the role product developers play in reaching users within AI assistants [00:00:24].

The Ideal AI Assistant: Tony Stark’s Jarvis

The aspiration for future AI interfaces is exemplified by Tony Stark’s AI assistant, Jarvis [00:01:25]. Jarvis demonstrates an advanced level of interaction, performing complex tasks seamlessly [00:03:03].

Jarvis’s capabilities include:

  • Compiling databases from disparate sources (e.g., SHIELD, FBI, CIA intercepts) [00:01:52].
  • Generating dynamic user interfaces (UI) on demand for complex data visualization [00:01:59].
  • Accessing public and private records [00:02:21].
  • Analyzing and plotting data (e.g., thermogenic occurrences) [00:02:31].
  • Joining across different datasets and filtering information [00:02:38].
  • Showing related news articles [00:03:26].
  • Creating flight plans [00:02:50].
  • Interfacing with home systems (e.g., doorbell cameras) [00:03:31].
  • Utilizing various interaction methods beyond voice, such as typing and gestures [00:04:56].

While some capabilities like creating databases from classified sources or holographic displays are still challenging, the technology exists for generating UI dynamically [00:04:03]. The question then becomes, why don’t we have our own Jarvis already [00:04:21]?

The Challenge: Building Integrations

The primary obstacle preventing widespread adoption of universal AI assistants like Jarvis is the difficulty of building robust integrations [00:05:15]. Users desire one central AI that can interface with everything, not just a select few popular services [00:05:34]. Large companies like Google are unlikely to build integrations for niche services, such as a local city government website for reserving park pavilions [00:05:43]. If an AI cannot do “everything,” the incentive to spend time wiring up integrations for “some things” diminishes [00:05:56].

Evolution of AI Interaction and the Integration Problem

The evolution of AI interaction can be categorized into phases:

Phase 1: LLMs as Question Answering Systems (e.g., Early ChatGPT)

  • Description: Approximately three years ago, ChatGPT introduced a pivotal moment by providing a good user experience around a Large Language Model (LLM) [00:06:57]. These LLMs could answer questions, generating tokens based on input [00:07:06].
  • Limitation: Users had to manually provide context (e.g., copy-pasting code or text) and then manually extract the results [00:07:57]. The LLM couldn’t do anything directly, only answer questions [00:08:27].

Phase 2: Host Application Integrations

  • Description: Host applications began to tell the LLM which services were available, allowing the LLM to request more context or perform actions through integrations like search engines, calendars, or Slack [00:08:35].
  • Limitation: The capabilities were limited by the integrations built by the LLM developers (e.g., OpenAI or Anthropic) [00:09:12]. These developers wouldn’t build integrations for highly specific or local services [00:09:39]. While proprietary plugin systems like OpenAI’s GPT plugins exist, building separate integrations for each LLM provider is impractical [00:09:51]. Users don’t want multiple specialized AI wrappers; they want one Jarvis that can augment itself with any capability [00:10:27].

Phase 3: Model Context Protocol (MCP)

Model Context Protocol (MCP) is introduced as the solution to the integration problem [00:10:58]. MCP provides a standard mechanism for AI assistants to communicate with various tools and services [00:06:33].

MCP Architecture and Benefits

  • Standard Protocol: MCP is a standard protocol that all AI assistants will support, allowing developers to build to the MCP specification and be usable by any AI assistant [00:11:05].
  • Dynamic Services: The host application communicates with the LLM, informing it of available services that can be dynamically added or removed [00:11:42]. The LLM selects the most appropriate tool based on the user’s query and available services [00:12:01].
  • Standardized Clients: The host application creates a standard client for each service, eliminating the need for special integrations [00:12:09].
  • Service Provider Control: Service providers create MCP servers that interface with their unique tools, resources, and prompts [00:12:22]. This allows service providers to control the unique aspects of their offering, while the server-client communication remains standard [00:12:36].
  • “Jarvis’s Hands”: MCP gives AI assistants the ability to “do stuff” by providing a standardized way to interact with external services [00:12:55].

Enhancing User Experience with MCP: A Demo

A demonstration of MCP showcases how an LLM can interact with different MCP servers for a journaling task:

  1. Location and Weather: An MCP server called “locationator” determines the current location, and another “get weather” tool retrieves weather conditions based on coordinates [00:13:55].
  2. Authentication: The EpicMe MCP server handles authentication using OAuth 2.1, allowing the AI client to perform authenticated tasks like creating journal entries [00:14:43]. This means customer success with AI solutions can be built securely.
  3. Dynamic Tagging: The LLM can check available tags and create new ones (e.g., “travel” for a trip entry), then apply them [00:16:12].
  4. Intelligent Formatting and Translation:
    • When retrieving a journal entry, the LLM can decide to format the raw JSON data into a more readable Markdown format with a title [00:16:47].
    • The LLM can translate responses from an MCP server (e.g., English) into the user’s preferred language (e.g., Japanese), further enhancing user interaction [00:17:41].

This demonstrates a significant shift in the role of user experience in AI. Users will no longer need to rely on specific keywords or search formats like in traditional search engines [00:18:25]. Instead, they can speak naturally, and the AI will understand their intent to accomplish a task, not just find information [00:18:42]. This leads to AI applications that can actively perform actions for the user [00:18:50].

Conclusion and Future Outlook

The advent of MCP is poised to bring about a universal Jarvis-like experience for everyone [00:11:29]. While AI clients are still developing to fully leverage MCP, the potential for seamless, integrated AI assistance is immense [00:11:22]. This marks a significant step in advancements in AI and its future implications, particularly for integrating AI into business operations and personal workflows.

For further learning, resources include the Model Context Protocol specification and epicai.pro, which offers insights into AI and its impact on AI engineering and user interaction [00:19:02].