AI application frameworks and architecture

From: aidotengineer

The field of AI engineering is undergoing a rapid evolution, with new frameworks and architectural patterns emerging to support the development of intelligent applications [00:12:09]([00:12:09]. This sector is experiencing a revolution comparable in scale to only a few others in tech history [00:17:31]([00:17:31].

The Landscape and Growth

The AI Engineer World’s Fair in 2025 highlights the explosive growth in AI development, with over 3,000 attendees and more than 250 speakers discussing various aspects of AI architecture and infrastructure [00:21:16]([00:21:16], [00:21:39]([00:21:39]. This growth is driven by real user adoption, exemplified by ChatGPT reaching 100 million users faster than any other consumer product [00:18:06]([00:18:06], and GitHub Copilot having millions of subscribers [00:18:37]([00:18:37]. Azure AI also demonstrates significant enterprise adoption, generating $13 billion in annual revenue [00:18:46]([00:18:46], [00:18:48]([00:18:48].

Key Frameworks and Tools

Several frameworks and tools are at the forefront of AI product development:

Llama Index: Described as a leading framework for building AI applications [00:16:10]([00:16:10].
Microsoft Azure AI Platform (Foundry): Positioned as a comprehensive platform for building agentic applications and systems, supporting 70,000 customers and all internal Copilot instances [00:46:05]([00:46:05], [00:46:46]([00:46:46]. It enables building single and multi-agents, and integrates with evaluation SDKs for continuous improvement [00:50:09]([00:50:09], [00:56:21]([00:56:21].
GitHub Copilot: Evolving from a sidekick to a teammate, it can operate within a codebase, run tests, and is extensible to interact with other agents [00:39:35]([00:39:35], [00:40:28]([00:40:28].
Smidree: A new company focused on orchestrating and organizing Model Context Protocol (MCP) instances [02:30:27]([02:30:27].
Brain Trust: An end-to-end evaluation platform for building world-class AI applications [00:35:56]([00:35:56].
Neo4j: A graph database with a dedicated “graph RAG” track at the conference, emphasizing its role in building robust AI applications [00:19:37]([00:19:37], [01:43:50]([01:43:50].
Cursor: An AI-powered developer productivity platform that integrates with GitHub to help teams ship higher quality software faster [00:20:20]([00:20:20], [01:12:19]([01:12:19]. It achieved rapid growth, reaching $100 million AR in 12 months with half a million developers [01:12:22]([01:12:22].
Windinsurf: The first agentic IDE that aims to 10x engineers’ capabilities [00:20:29]([00:20:29].
MongoDB Atlas: Simplifies storing data, including vector embeddings [00:20:33]([00:20:33].
Daily (Pipecat): Provides the most widely used framework for voice agents and multimodal AI [00:20:41]([00:20:41].
Augment Code: An AI agent that understands user and codebase context [00:20:47]([00:20:47].
Work OS: Helps ship software to enterprise customers with features like single sign-on [00:20:51]([00:20:51].

Architectural Patterns

The conference emphasizes the search for “standard models” in AI engineering, akin to ETL, MVC, CRUD, or MapReduce in traditional software [02:54:50]([02:54:50].

Agent-Oriented Architectures

A significant trend is the move towards agentic applications and systems [00:37:11]([00:37:11]. Agents are defined as software that plans steps, includes AI, takes ownership of a task, and can hold a goal in memory, trying different hypotheses and backtracking [01:07:15]([01:07:15].

Human Input vs. AI Output: A more useful mental model than arguing about agent definitions is tracking the ratio of human input to valuable AI output [00:32:13]([00:32:13]. This ranges from Copilot’s debounce input to chat applications’ few queries, to new agents performing deep research with minimal input [00:32:28]([00:32:28]. The “zero to one” ambient agents require no human input but yield valuable AI output [00:33:01]([00:33:01].
Foundry Agent Factory: Microsoft’s approach to building and deploying agents, emphasizing continuous loops for retraining and redeployment based on feedback (“signals loop”) [00:46:38]([00:46:38]. This moves away from linear software factories to continuous improvement [00:46:42]([00:46:42].
Local Agent Deployment: Models are increasingly moving from the cloud to local devices, enabling real-time, privacy-compliant applications in sectors like manufacturing and healthcare [00:57:10]([00:57:10], [00:57:35]([00:57:35]. The goal is for agents created in the cloud to run and reason locally [00:58:01]([00:58:01].

Retrieval-Augmented Generation (RAG)

RAG remains integral, used in 50% of current AI applications [00:47:28]([00:47:28].

Agentic RAG: An evolution of RAG that involves iterative evaluation and planning, showing a 40% improvement in accuracy for complex queries compared to “naive” single-shot RAG [00:47:38]([00:47:38], [00:47:45]([00:47:45].
Graph RAG: A specialized application of RAG that leverages graph databases for knowledge retrieval and reasoning [00:19:46]([00:19:46], [01:43:50]([01:43:50].

The Model Context Protocol (MCP)

MCP is emerging as a foundational shift in the internet’s economy, where tool calls are becoming the new clicks [02:31:02]([02:31:02], [02:31:05]([02:31:05].

Origin and Purpose: Co-created by David and Justin at Anthropic, MCP originated from the need to prevent constant copying and pasting of context into LLM context windows [02:33:09]([02:33:09]. It addresses the “model agency” problem by giving models the ability to interact with the outside world [02:34:06]([02:34:06], [02:34:09]([02:34:09]. It is an open-source, standardized protocol to facilitate this interaction at scale [02:34:20]([02:34:20], [02:34:21]([02:34:21].
Adoption: Initially met with skepticism, MCP gained momentum when adopted by coding tools like Cursor and VS Code [02:37:18]([02:37:18]. Now, major players like Google, Microsoft, and OpenAI have also adopted it, moving it closer to becoming an industry standard [02:37:50]([02:37:50], [02:37:52]([02:37:52].
Key Features and Principles:
- Agent-Centric: Designed with the future of agents in mind, optimizing for scenarios where models choose actions and decide what to do [02:39:14]([02:39:14], [02:39:20]([02:39:20].
- Server Simplicity: Optimizes for server simplicity, as it anticipates many more servers than clients in the ecosystem, pushing complexity to the client [02:40:23]([02:40:23], [02:40:50]([02:40:50].
- Tools, Resources, and Prompts: MCP supports dynamic discovery, allowing servers to provide context-aware tools. Tools should be high-quality rather than just a large quantity to avoid confusing the AI [03:07:34]([03:07:34], [03:08:16]([03:08:16]. Resources allow for returning references to files or data rather than raw giant payloads [03:10:26]([03:10:26].
- Sampling: A feature allowing servers to request LLM completions from the client, enabling capabilities like summarizing resources or formatting data for the LLM [03:11:50]([03:11:50], [03:12:26]([03:12:26].
- Streamable HTTP: The primary transport mechanism, enabling bidirectional communication between agents [02:40:02]([02:40:02].
- OAuth 2.1 Support: Initially a point of difficulty due to its niche adoption, it is now supported and crucial for enterprise-grade authorization with remote MCP services [02:41:23]([02:41:23], [02:52:50]([02:52:50], [03:14:44]([03:14:44].
Implementation Challenges and Best Practices:
- Not Just an API Wrapper: MCP servers should not merely be direct proxies for existing APIs; they must be designed to provide context to an agent, potentially using structured formats like Markdown for better model reasoning [03:25:57]([03:25:57], [03:26:07]([03:26:07], [03:29:44]([03:29:44].
- Error Design: Errors from MCP tools must also be designed as context that models can understand and reason about [03:31:03]([03:31:03].
- Cost and Token Management: Developers must be mindful of the cost implications as models consume tokens, especially when passing large amounts of data [03:31:52]([03:31:52].
- Security Concerns: Directly allowing random MCP tools in an organization poses significant security risks, including prompt injection [03:28:38]([03:28:38], [03:28:40]([03:28:40]. Centralized MCP gateways can manage authentication and provide a single point for observability and rate limiting [02:58:07]([02:58:07].
- Continuous Iteration: Building MCP applications is not a “set and forget” task; continuous updates and tweaks are necessary to adapt to evolving models and client behaviors [03:32:30]([03:32:30].

Other Standard Models

SPA (Sync, Plan, Analyze, Deliver, Evaluate): A mental model for building AI-intensive applications that make thousands of AI calls to serve a purpose [00:33:47]([00:33:47], [00:34:13]([00:34:13]. This includes processing data into a knowledge graph, generating structured outputs, and even generating code [00:34:22]([00:34:22], [00:34:28]([00:34:28].

Evolving Development Paradigms

From Software Factory to Agent Factory: The industry is shifting from shipping binaries to shipping agents that can retrain, redeploy, and change post-deployment [00:38:33]([00:38:33], [00:45:40]([00:45:40].
From Pair Programming to Peer Programming: Tools like Copilot are becoming true teammates that can live in a codebase, operate in branches, and run tests [00:38:24]([00:38:24], [00:39:35]([00:39:35].
AI as Augmentation: While full automation is exciting, co-pilots and augmentation remain highly valuable and underrated [01:18:31]([01:18:31], [01:19:01]([01:19:01]. The “Iron Man analogy” suggests building a suit that augments humans first, then gradually extends to autonomous capabilities as models improve [01:19:08]([01:19:08], [01:19:34]([01:19:34].

Challenges and Future Outlook

Model Market Competitiveness: The market for model capabilities is becoming increasingly competitive, with prices crashing dramatically (e.g., GPT-4 from $30 t o$ 2 per million tokens in 18 months) [01:10:45]([01:10:45], [01:11:00]([01:11:00], [01:11:02]([01:11:02]. This fosters a multimodel environment where no single model is right for every product [00:47:02]([00:47:02], [01:12:04]([01:12:04].
LLM Limitations: LLMs are currently poor at writing jokes, indicating that true AGI is still far off [01:16:52]([01:16:52]. The “memory feature” in some models, like ChatGPT, can take control away from power users by automatically consulting previous notes, which can be undesirable [01:34:36]([01:34:36], [01:34:58]([01:34:58].
“Emperor Has No Clothes”: The field is still very early, and there’s significant “alpha to mind” for AI engineers [02:59:59]([02:59:59], [02:07:00]([02:07:00]. Simpler solutions often outperform overly complex ones [02:34:35]([02:34:35].
Execution as a Moat: In the competitive AI landscape, strong execution in building and shipping a great user experience is the primary differentiator, rather than inventing core models or product surfaces [01:13:52]([01:13:52], [01:21:58]([01:21:58].
Building Beyond Engineers: The playbook for success in AI applications is to extend beyond developer tools to other industries, focusing on problem-centric solutions and redesigning workflows from first principles around models [01:14:01]([01:14:01], [01:15:01]([01:15:01], [01:16:16]([01:16:16].
Hard Problems: Significant opportunities lie in tackling hard problems where solutions are not found in common datasets, such as robotics, biology, and material science, which require clever data collection and interaction with the physical world [01:20:44]([01:20:44], [01:21:01]([01:21:01].

The future of AI is seen as moving from “dialup era” to “broadband,” with continuous breakthroughs in model capability and cost reduction [01:23:19]([01:23:19], [01:23:51]([01:23:51]. The role of AI engineers is to be “translators for the rest of the world,” building revolutionary applications that leverage these advancements [01:24:25]([01:24:25], [01:24:28]([01:24:28].

Tubegraph

Explorer

Table of Contents