From: aidotengineer
Kent C. Dodds, a developer focused on building excellent user experiences, notes a shift towards AI, where users are increasingly found [00:00:10]. He teaches people how to build excellent user experiences with AI through his course platform, epicai.pro [00:00:19]. This article discusses how user interaction is changing, how Model Context Protocol (MCP) and similar technologies facilitate this change, and the role of product developers in this evolution [00:00:24]. The core idea is “letting AI interface with your app with MCPs or Model Context Protocol services” [00:00:48].
The Ideal AI Assistant: Jarvis
Tony Stark’s AI assistant, Jarvis, from the Iron Man movies, exemplifies an ideal AI user experience [00:01:25]. Jarvis can:
- Compile databases from various sources like SHIELD, FBI, and CIA intercepts [00:01:52].
- Initiate virtual crime scene reconstructions [00:02:02].
- Access public records [00:02:21].
- Bring up thermogenic signatures and factor in data like 3,000°C heat [00:02:26].
- Access satellites and plot historical occurrences [00:02:32].
- Perform joins across different datasets, like filtering out areas with “Mandarin attacks” from thermogenic occurrences [00:02:38].
- Show related news articles [00:03:27].
- Create a flight plan [00:02:51].
- Answer the doorbell and display visitor information [00:03:31].
- Generate dynamic UIs and interact with them [00:04:58].
- Interface through voice, typing, and gestures [00:04:53].
While some capabilities like creating databases from classified sources or holographic displays are still challenging [00:04:03], the technology for generating UIs dynamically already exists [00:04:15]. The core barrier to having a personal Jarvis is not the underlying technology, but the difficulty of building all the necessary integrations [00:05:15]. It’s impractical for major AI developers like Google, OpenAI, or Anthropic to build integrations for every niche service, such as a local city government website for reserving park pavilions [00:05:43]. Users desire one unified robot that can interface with everything [00:05:34].
History and Architecture of AI Interaction Protocols
The evolution of AI interaction can be divided into phases:
Phase 1: Early LLMs and Manual Context (Approx. 3 Years Ago)
When ChatGPT emerged, it was pivotal because of its host application layer, which provided a good user experience for interfacing with LLMs [00:07:33]. In this phase, users had to manually provide context by copy-pasting text or images [00:08:00]. LLMs could answer questions but couldn’t perform actions, and managing context was cumbersome [00:08:27].
Phase 2: Host Application Integrations
In this phase, the host application (e.g., ChatGPT itself) started to act as an intermediary, telling the LLM which services were available to it [00:08:36]. This allowed LLMs to do “stuff,” such as accessing search engines, calendar integrations, or Slack integrations [00:08:48]. However, this approach was limited by the time and resources of the host application developers to build specific integrations [00:09:24]. Proprietary systems like OpenAI’s GPT plug-in system also meant developers had to build separate integrations for different platforms (e.g., OpenAI, Anthropic, Google) [00:09:51]. Users don’t want multiple LLM wrappers; they want one “Jarvis” that can augment itself with any capability [00:10:28].
Phase 3: Model Context Protocol (MCP)
This phase introduces Model Context Protocol (MCP), which enables AI to do “anything” [00:11:01]. MCP is a standard protocol that AI assistants support or will soon support, allowing developers to build to a single specification and be usable by any AI assistant [00:11:05].
MCP Architecture
The architecture of MCP involves:
- Host Application: Communicates with the LLM and dynamically manages available services [00:11:42].
- LLM: Knows what services are available and selects the most appropriate tool based on the user’s query [00:12:01].
- Standard Client: The host application creates a standard client for each service, requiring no special integration [00:12:09].
- Service Provider: Creates MCP servers that interface with unique tools, resources, prompts, and sampling [00:12:22].
The key is that the communication between the server and the standard client is consistent, allowing service providers to control the unique aspects of their service while maintaining a universal interface [00:12:49]. This mechanism provides “Jarvis hands” to actually perform tasks [00:12:55].
Demonstration of MCP
Kent C. Dodds demonstrates MCP using Cloud Desktop, configured with three MCP servers: a journaling server (EpicMe), a location service (locationator), and a weather service [00:13:30].
A user prompt like “Please write a journal entry for me… about my trip with my daughter. I would like you to derive my location and weather conditions from my device location and make up a creative story with relevant text” triggers several MCP calls [00:13:34]:
- Location Determination: The
locationator
MCP server determines the current location [00:13:55]. - Weather Retrieval: The
get weather
MCP server retrieves current weather conditions for the given coordinates [00:14:32]. - Authentication: The
EpicMe
server’sauthenticate
tool is called, prompting for user email and an OAuth 2.1 token for secure access [00:14:43]. This authentication ensures secure access to personal data [00:15:19]. - Journal Entry Creation: The LLM generates the journal entry, and the MCP server creates it [00:15:37].
- Tagging: The system checks for available tags, creates a new “travel” tag if needed, and adds it to the entry [00:16:12].
- Retrieval and Formatting: The user can request to see the journal entry [00:16:41]. The MCP server provides the data, and the LLM client can then format it in a user-friendly way (e.g., Markdown with a title) [00:16:51].
- Multimodal Capabilities: While the server sends responses in English, an LLM could translate it into other languages like Japanese, showcasing multimodal capabilities [00:17:41].
- Action Execution: The user can command the AI to perform actions like deleting the post and logging out, demonstrating the AI’s ability to act on behalf of the user [00:17:53].
This demonstrates the shift from traditional browser-based interaction, where users Google and phrase questions with keywords, to a more natural interaction where users speak their intent, and the AI figures out how to accomplish the task through MCP-enabled services [00:18:15].
Conclusion
Model Context Protocol (MCP) is a standard mechanism that enables AI assistants to communicate with various tools and services [00:06:33]. It addresses the challenge of building extensive integrations by providing a unified interface [00:11:11]. This advancement is expected to lead to a future where everyone can have a “Jarvis-like” AI assistant, capable of augmenting itself with any capability in the world [00:10:31], thereby enhancing AI agent performance and tool usage.
Resources
- The specification for Model Context Protocol [00:19:04].
- epicai.pro: A platform to learn about MCP and AI in general, offering posts, workshops, and cohorts on the future of user interaction and how AI is changing the game [00:19:09].