Challenges and potential of AI assistants

From: aidotengineer

User interaction is undergoing a significant transformation, with a shift towards AI assistance as users increasingly gravitate towards these platforms [00:00:24]. This evolution is being facilitated by developments like Model Context Protocol (MCP) [00:00:27]. The goal is to enable product developers to reach users where they want to be: inside AI assistants [00:00:38].

The Vision: Tony Stark’s Jarvis

An ideal AI assistant, exemplified by Tony Stark’s Jarvis, demonstrates the vast potential of these technologies [00:02:25]. Jarvis can:

Compile databases from various sources (e.g., SHIELD, FBI, CIA intercepts) [00:01:52].
Generate user interfaces (UI) on demand [00:03:13].
Access public records [00:02:21].
Bring up and analyze data like thermogenic signatures [00:02:26].
Join and cross-reference different datasets [00:02:28].
Show related news articles [00:03:28].
Create flight plans [00:02:51].
Provide real-time information, such as who is at the door [00:03:31].

Jarvis offers an amazing user experience, incorporating voice interaction, typing, gestures, and dynamic UI generation [00:04:43]. Without an AI assistant like Jarvis, such complex research and tasks would take significantly longer [00:03:03].

Current Challenges and Limitations of AI Assistants

Despite advancements, we don’t yet have personal AI assistants like Jarvis widely available [00:04:21]. While the technology for generating UI already exists [00:04:15], major challenges persist.

The primary obstacle is the difficulty of building comprehensive integrations for all possible services and applications [00:05:15]. This is a significant part of the challenges in creating effective AI agents and challenges in AI agent development. Large companies like Google are unlikely to build integrations for niche services, such as a local city government website for reserving park pavilions [00:05:40]. If an AI assistant cannot integrate with “everything,” users may not see the value in wiring it up for only “some things” [00:05:56].

Evolution of AI Assistant Interaction and their Limitations

The development of AI assistants can be seen in three phases:

Phase 1: ChatGPT Era (circa 3 years ago)
- Potential: Large Language Models (LLMs) became pivotal due to their host application layer, which provided a good user experience for interfacing with LLMs, leading to rapid investment and improvement [00:07:33]. They could answer questions effectively [00:07:06].
- Challenges: Users had to manually provide context by copying and pasting information (e.g., code) into the LLM and then manually copying results back [00:07:57]. LLMs could answer questions but couldn’t do anything directly [00:08:27]. This manual context management was cumbersome [00:08:31].
Phase 2: Host Application Integrations
- Potential: The host application began telling the LLM what tools were available, allowing the LLM to request more context or perform actions like scheduling meetings, accessing search engines, or summarizing Slack messages [00:08:35]. This addressed some of the challenges in building AI applications by enabling actions.
- Challenges: The capabilities were limited to the integrations built by the LLM developers (e.g., OpenAI, Anthropic), who lacked the time or incentive to integrate with every niche service [00:09:12]. Proprietary plugin systems (like OpenAI’s GPT plugin system) mean developers must build unique integrations for each platform, which is unsustainable [00:09:51]. Users don’t want to use multiple “LLM wrappers” or host applications; they desire a single Jarvis that can augment itself with any capability [00:10:27].

Model Context Protocol (MCP): A Solution to Integration Challenges

The Model Context Protocol (MCP) represents Phase 3 in the evolution of AI assistants, aiming to overcome the current challenges and solutions in building AI agents by enabling pervasive integration [00:10:55].

What is MCP?

MCP is a standard protocol that all AI assistants support or will soon support [00:11:01]. It provides a standard mechanism for AI assistants to communicate with various tools and services [00:06:33].

How MCP Works (Architecture)

The host application communicates with the LLM [00:11:42].
The host application dynamically informs the LLM about available services, which can be added or removed [00:11:46].
The LLM, knowing the available services and the user’s query, selects the most appropriate tool [00:12:03].
The host application creates a standard client for each service, meaning no special integration is needed [00:12:09].
Service providers create MCP servers that interface with their specific tools, resources, prompts, and sampling mechanisms [00:12:22].
The key is that the communication interface between the server and the client is standardized, while the unique aspects of each service are controlled by the service provider [00:12:46].

This standardization gives Jarvis (the AI assistant) “hands” to perform actions [00:12:55].

Potential of MCP in Action

A demonstration illustrates MCP’s capabilities, even though client-side dynamic UI support is still developing [00:13:01]:

An LLM, configured with MCP servers, can process requests like “Please write a journal entry for me” [00:13:30].
MCP servers, such as “locationator” and “get weather,” can determine current location and weather conditions, respectively [00:13:55].
Authentication is built into MCP servers, using OAuth 2.1, ensuring security for authenticated tasks like creating journal entries [00:15:19].
The LLM can generate and format content (e.g., a journal entry with a title and content) and even apply relevant tags [00:15:38].
The server can communicate in a format sensible to it (e.g., JSON), and the client LLM can then reformat it for better user display (e.g., Markdown) [00:16:51].
LLMs can translate responses from the MCP server into the user’s preferred language, even if the server’s native response is different [00:17:41].
Tasks like deleting posts and logging out are handled securely through the authenticated MCP server [00:17:53].

Currently, users might need to approve tool calls due to lack of trust or full capability, but this is expected to evolve [00:14:05].

The Future: Beyond Browsers with AI Assistants

The transition ahead means users will increasingly move away from browsers and traditional search methods [00:18:15]. Instead of typing specific keywords or phrases for search engines, users will naturally speak their questions and intentions to AI [00:18:42]. The AI will not only understand what the user is trying to search for but also what they are trying to do, and then execute that action [00:18:50]. This represents the vast challenges and benefits of AI agents and the significant potential of AI assistants to transform user experience.

MCP is instrumental in enabling this future, bringing us closer to a world where a “Jarvis for everybody” is a reality [00:11:31].

Resources

Model Context Protocol Specification [00:19:04]
EpicAI.pro: Learn more about MCP and AI in general, including posts, workshops, and cohorts [00:19:09].

Tubegraph

Explorer

Table of Contents