From: aidotengineer

The Model Context Protocol (mCP) is an open protocol designed to enable seamless integration between AI applications/agents and external tools and data sources [01:55:00]. It standardizes how AI applications interact with external systems [03:10:00].

The motivation behind mCP is the core concept that models are only as good as the context provided to them [01:18:00]. This contrasts with earlier AI applications where context was often manually copied and pasted [01:33:00]. Over time, models have evolved to have hooks into user data and context, making them more powerful and personalized [01:46:00].

mCP can be understood by comparing it to preceding protocols:

  • APIs standardized how web applications interact between the front-end and back-end, allowing front-ends access to servers and databases [02:16:00].
  • LSP (Language Server Protocol) standardized how IDEs interact with language-specific tools, allowing an LSP-compatible IDE to communicate with different features of coding languages [02:37:00].
    • For example, building a Go LSP server once allows any LSP-compatible IDE to hook into Go-specific functionalities during coding [02:59:00].

The aim of mCP is to flatten the N times M problem of fragmentation in AI system development, where different client applications had numerous permutations for interacting with various servers [06:06:00]. mCP serves as a layer between application developers and tool/API developers to standardize access for LLMs [06:17:17].

Core Components and Interfaces

mCP standardizes interactions through three primary interfaces [03:14:00]:

Tools

Tools are model-controlled functionalities [10:27:00]. An mCP server exposes tools to the client application, and the Language Model (LLM) within the client application can choose when to invoke these tools [10:29:00].

  • Tool descriptions are provided as part of the server definition, allowing the model to decide optimal invocation times [10:53:00].
  • Tools can be used for various actions, including:
    • Retrieving (read) data [11:05:00].
    • Sending (write) data to applications [11:08:00].
    • Updating databases or writing files to a local file system [11:15:00].
  • Tools are beneficial when it’s ambiguous if and when an LLM should invoke them, for example, deciding whether to call a vector database based on current context [16:02:00].
  • Thousands of tools can theoretically be exposed by an LLM if search mechanisms like “tool to search tools” are implemented, using Rag (Retrieval Augmented Generation) or fuzzy/keyword search over tool libraries [46:51:00]. Hierarchical systems of tools can also progressively expose groups of tools based on the current task [47:18:00].

Resources

Resources are data exposed to the application, and they are application-controlled [11:23:00].

  • A server can define or create resources like images, text files, or JSON data [11:31:00].
  • Resources offer a rich interface for applications and servers to interact beyond simple text-based chat [11:50:00].
  • They can be static (e.g., a static file) or dynamic, where the client application sends user or file system information, and the server interpolates it into a complex data structure sent back to the client [12:03:00].
  • In applications like Claude for Desktop, resources manifest as attachments, allowing users to select and attach them to a chat for the model to use [12:24:00].
  • Resources can also be automatically attached by the model if deemed relevant to a task [12:43:00].
  • mCP supports resource notifications, where the client can subscribe to a resource and be notified by the server when it’s updated with new information [21:17:00].

Prompts

Prompts are user-controlled, predefined templates for common interactions with a specific server [12:59:00].

  • They are invoked by the user, unlike tools which are invoked by the model [13:01:01].
  • An example is slash commands in an IDE like Zed, where /GPR can interpolate a longer, predefined prompt from an mCP server to summarize a pull request [13:14:00].
  • Prompts can be dynamic, interpolated with context from the user or application, allowing the server to return a customized prompt based on the task [20:58:00].
  • They allow different teams to standardize interactions, such as document Q&A with specific formatting rules [13:51:00].

The distinction between tools, resources, and prompts emphasizes mCP’s goal to define richer interactions between applications and servers, providing more control to the model, application, and user, respectively [22:26:00].

Key Technical Features

Sampling

Sampling allows an mCP server to request LLM inference calls (completions) from the client, rather than the server needing to implement or host its own LLM interactions [53:52:00].

  • This feature allows servers to access intelligence, even if they’ve never interacted with a client before [56:06:00].
  • The client retains full control over privacy, cost parameters, and model preferences (e.g., specific Claude versions, model size), potentially ignoring malicious or undesired requests [55:34:00].
  • This design federates LLM requests, letting the client own LLM hosting and model choice, while the server can request inference with various parameters like system/task prompts, temperature, and max tokens [54:55:00].

Composability

Composability in mCP refers to the logical, not physical, separation between a client and a server [56:21:00]. Any application, API, or agent can simultaneously function as both an mCP client and an mCP server [56:34:00].

  • This enables chaining interactions, where calls can hop from a user to a client-server combination, and then to the next client-server combination, and so on [57:17:00].
  • This allows for the creation of complex, multi-layered architectures of LLM systems, where each layer can specialize in a particular task [57:28:00].
  • The combination of sampling and composability allows for hierarchical systems of agents, where each agent/server can federate sampling requests through layers back to the application controlling the primary LLM interaction [01:11:41].

Remote Servers and Authentication

mCP supports remotely hosted servers and OAuth 2.0 for authentication [01:13:28].

  • Remote servers can operate over ssse (Server-Sent Events), which is the preferred method for remote communication, as opposed to standard IO for local or in-memory communication [01:14:00], [01:32:19].
  • The server orchestrates the authentication handoff (e.g., with Slack), handling the OAuth 2.0 handshake, getting a callback URL, and letting the client open it for user authorization [01:14:22].
  • The server then holds the OAuth token and federates future interactions using a session token [01:14:49].
  • This removes the friction of local hosting, allowing agents and LLMs to live on different systems from where the server is running [01:15:20].

Integration with Agents

mCP is seen as a foundational protocol for agents [02:38:00].

  • It supports the concept of an “augmented LLM,” which is an LLM enhanced with retrieval systems, tools, and memory [02:51:00].
  • mCP serves as the layer that federates and simplifies how LLMs interact with these systems, allowing agents to expand their capabilities even after initial programming [02:51:00].
  • Agent builders can focus on the core agent loop (context management, memory usage, model choice) while relying on mCP to standardize the way agents hook into external servers, tools, and data [02:51:00].
  • Frameworks like “mCP Agent” (built by Last Mile AI) demonstrate how to define sub-agents (e.g., research agent, fact-checker agent, report writer agent) and give them access to various mCP servers (e.g., Brave for web search, file system access) [03:14:00].
  • This declarative approach means the agent builder specifies the task and available servers/tools, and the framework handles the underlying interactions [03:32:00].

Future developments and roadmap for mCP

Key areas on the roadmap include:

  • Registry API: A unified and hosted metadata service to centralize the discovery and publishing of mCP servers [01:22:00]. This will address fragmentation and enable dynamic discovery of new capabilities for agents, making them self-evolving [01:36:01]. It will also help with trust and verification (e.g., official servers) and versioning [01:23:25], [01:24:01].
  • .well-known: A complement to the registry, allowing services (like shopify.com) to host a .well-known/mcp.json file to declare their mCP endpoint, resources, and tools [01:39:27]. This provides a top-down, verified way for agents to find and use tools on a specific domain [01:40:07].
  • Stateful vs. Stateless Connections: Exploring short-lived HTTP-like connections for basic requests and long-lived SSE connections for advanced features like sampling or server-to-client notifications [01:41:41].
  • Streaming: Supporting first-class streaming of multiple chunks of data from server to client within the protocol [01:42:41].
  • Namespacing: Addressing tool name conflicts when multiple servers expose tools with the same name, potentially through logical grouping of tools or first-class protocol support [01:42:51].
  • Proactive Server Behavior: Developing patterns for servers to initiate actions (e.g., event-driven, deterministic systems) by asking for user input or notifying the user about something [01:43:31].

Considerations and Best Practices

  • Error Handling and Debugging: While mCP itself doesn’t enforce observability, the server builder is incentivized to expose useful data and debugging information via metadata to the client for a good user experience [01:05:25]. Tools like “Inspector” (from Anthropic) allow users to examine logs and server connections [01:09:49].
  • Version Control: mCP servers, often released as packages (npm, pip), follow standard package versioning, implying clear upgrade paths [01:16:09]. The upcoming registry will also support versioning for metadata [01:24:01].
  • Trust and Security: Users should be judicious about which servers they connect to [01:18:06]. The server builder is responsible for controlling the flow of information and authentication, and future registry features will aid in verifying trusted sources [01:11:31], [01:38:19].
  • LLM Generation of Servers: For simpler servers that primarily wrap existing APIs, LLMs can automatically generate mCP servers [01:49:24]. However, more complex logic, logging, and data transformations may require manual implementation [01:50:00].
  • Transport Agnostic: mCP is transport agnostic, meaning the underlying communication method (e.g., standard IO, SSE) does not alter the protocol’s fundamental behavior, allowing for custom transports [01:32:05].