From: aidotengineer
Kent C. Dodds teaches how to build excellent user experiences, with a recent shift from web development to AI, as users are increasingly engaging with AI assistants [00:00:07]. The core topic of this discussion is how user interaction is evolving, how protocols like Model Context Protocol (MCP) facilitate this change, and the role of product developers in reaching users within AI assistance platforms [00:00:24]. The talk focuses on “Letting AI interface with your app with MCPS or Model Context Protocol services” [00:00:48].
The Vision: Jarvis and the Challenge of Integration
The aspiration for AI interaction is exemplified by Tony Stark’s AI assistant, Jarvis, from the Iron Man movies [00:01:25]. Jarvis can perform a wide range of tasks, including:
- Compiling databases from various sources (e.g., SHIELD, FBI, CIA intercepts) [00:01:52], [00:03:11].
- Generating user interfaces (UIs) on demand [00:03:14], [00:04:15].
- Accessing public records [00:02:21], [00:03:14].
- Bringing up thermogenic signatures and analyzing data from cloud services and satellites [00:02:26], [00:03:17].
- Joining across different datasets [00:02:38], [00:03:22].
- Showing related news articles [00:03:26].
- Creating flight plans [00:02:50], [00:03:28].
- Interfacing with home systems, such as showing who is at the door [00:03:31].
Jarvis represents an ideal user experience that seamlessly integrates with various systems, allowing for voice commands, typing, gestures, and dynamic UI generation [00:04:53]. While the technology exists for many of these capabilities (e.g., UI generation [00:04:15]), the primary barrier to having a personal Jarvis is the sheer difficulty of building comprehensive integrations [00:05:15]. It’s impractical for major AI developers to build integrations for every possible tool or service, such as a local city government website for reserving park pavilions [00:05:43]. Users desire a single AI assistant that can augment itself with “any capability in the world” [00:10:34].
Evolution of AI Integration
The journey towards seamless AI integration can be understood through three phases:
Phase 1: Initial LLMs and Manual Context
Around three years ago, the release of ChatGPT marked a pivotal moment, not because of the LLM itself (which had existed for some time), but due to the host application layer that provided a good user experience for interacting with LLMs [00:06:58]. This led to significant investment and rapid improvement in LLMs [00:07:48]. The main limitation in this phase was the manual provision of context. Users had to copy and paste code or text into the LLM and then manually transfer its output back into their workflows, making context management cumbersome [00:07:57].
Phase 2: Host Application Integrations
In the second phase, host applications began to facilitate interaction between the LLM and external services. The host application could inform the LLM about available services (e.g., search engines, calendar integrations, Slack integrations) and retrieve additional context as needed [00:08:35]. However, this approach was limited by the time and resources developers at LLM providers (like OpenAI or Anthropic) could dedicate to building specific integrations [00:09:24]. While proprietary plugin systems like OpenAI’s GPT plugin system exist, building separate integrations for different LLM providers (e.g., Anthropic, Google) is impractical [00:09:51]. Users don’t want multiple wrappers; they want one unified Jarvis [00:10:27].
Phase 3: Model Context Protocol (MCP)
MCP ushers in the third phase by providing a standard protocol for AI assistants to communicate with diverse tools and services [00:10:05], [00:10:33]. This standardization means that a service built to the MCP specification can be used by any AI assistant that supports the protocol [00:11:05]. This breakthrough is expected to lead to a “Jarvis for everybody” [00:11:34].
MCP Architecture
The MCP architecture involves several key components:
- Host Application: Communicates with the LLM and informs it about available services. Services can be dynamically added or removed, with the host application managing this context [00:11:42].
- LLM: Receives the user’s query and the list of available services from the host application, then selects the most appropriate tool [00:12:01].
- Client: The host application creates a standard client for each service it wants to interface with, adhering to a defined interface [00:12:09].
- Service Provider: Creates MCP servers that interface with specific tools, resources, prompts, and sampling features [00:12:22]. The unique logic for each service resides within the service provider’s control [00:12:36].
The standardization of communication between the server and client components is what makes MCP effective, essentially giving AI assistants “hands to be able to actually do stuff” [00:12:49].
MCP in Action: A Demo
A demonstration of MCP showcases its capabilities through a journaling scenario:
- Prompt: A user asks the AI assistant (configured with three MCP servers) to write a journal entry about a trip, including their email address, deriving location and weather, and generating a creative story [00:13:28].
- Locationator Tool: The assistant utilizes an MCP server called “locationator” to determine the current location [00:13:55]. For now, users must approve tool calls, unlike Tony Stark’s seamless interaction with Jarvis [00:14:05].
- Weather Tool: Another MCP server, “get weather,” retrieves current weather conditions based on the derived coordinates [00:14:32].
- Authentication: The “EpicMe” MCP server initiates an authentication flow using OAuth 2.1, requiring the user to provide an email and then an OAuth token [00:14:43], [00:15:19]. This ensures secure, authenticated tasks like creating journal entries [00:15:32].
- Journal Entry Creation: The LLM writes the journal entry, and the MCP server configures the inputs [00:15:38]. The system can also create and add relevant tags, such as a “travel” tag for the journal entry [00:16:12].
- Retrieval and Dynamic Display: The user can then ask the LLM to display the journal entry. The MCP server provides the entry in a structured format (e.g., JSON), and the client can decide how to best format and display it to the user (e.g., as Markdown) [00:16:40]. This demonstrates the potential for dynamic UI generation [00:17:11].
- Multimodal Potential: The LLM can also translate responses from the MCP server into different languages, even if the server only sends English [00:17:34].
- Further Actions: The user can continue to interact with the assistant, for example, to delete the post and log out [00:17:53].
Many MCP servers, like the EpicMe example, are designed to be exclusively accessible via MCP and its clients, not as traditional web applications [00:18:05].
Future Implications
The shift enabled by MCP moves towards more natural user interaction [00:18:42]. Instead of navigating browsers or crafting specific search queries, users can speak their questions, and the AI will understand their intent, not just their search query, but what they are “actually trying to do,” and then perform that action [00:18:43]. This transition promises to fundamentally change how users interact with technology.
Resources
- Model Context Protocol Specification: An important document to review for understanding MCP [00:19:04].
- EpicAI.pro: A platform offering courses, posts, workshops, and cohorts on AI in general and the future of user interaction [00:19:09].