From: aidotengineer

Immani, Co-founding Head of AI at Cat.io, discusses the development of an AI copilot using multi-agent orchestration for grounded reasoning systems in cloud architecture [00:00:01]. Cloud architecture requires reasoning, not just automation, due to its increasing complexity driven by users, developers, tools, constraints, and rising expectations [00:00:16]. Traditional synergistic tools struggle to scale with the diversity of decisions needed in cloud architecture [00:00:37]. Systems are needed that can understand, debate, justify, and plan to solve these complex problems, which goes beyond simple automation [00:00:44].

The architecture stack is both technical and cognitive [00:00:56]. Architects constantly negotiate trade-offs based on requirement definition, available time, and resources [00:01:03]. They rely on scattered and implicit context for decisions, and capturing this for AI requires understanding how architects think [00:01:21].

Challenges in AI-Driven Architecture Design

Cat.io identifies three high-level challenges when AI meets architecture design:

  • Requirement Understanding: Determining the origin, format, importance, and scope (global or specific) of requirements from textual documents [00:01:46].
  • Architecture Identification: Understanding the functionality of various architectural components and their roles based on their context within the architecture [00:02:05].
  • Architecture Recommendation: Generating recommendations that match requirements or improve the architecture to align with best practices, based on current architecture state and understood requirements [00:02:27].

These problems translate into specific AI challenges:

  • Mixing Semantic and Graph Context: Combining textual requirements with inherently graph-based architecture data to enable higher-level reasoning [00:02:54].
  • Complex Reasoning Scenarios: Breaking down vague, broad, or complex questions into manageable parts and planning their resolution [00:03:20].
  • Evaluation and Feedback: Assessing and providing feedback to large AI systems with many moving parts [00:03:42].

Grounding AI Agents in Specific Context

For AI agents to reason effectively, they need proper context about architecture [00:04:12]. Translating natural language into meaningful architecture retrieval tasks quickly is difficult [00:04:22].

Architecture Retrieval

Approaches for architecture retrieval include:

  • Semantic Enrichment: Collecting relevant semantic information for each architecture component to make it more searchable and findable in vector search [00:04:40].
  • Graph-Enhanced Component Search: Utilizing graph algorithms to retrieve the correct information pieces when searching for specific components or types of components within an architecture [00:04:57].

An early design for architecture retrieval involved breaking down JSON architecture data into natural language, enriching it with connection data, embedding it, and storing it in a vector database for search [00:07:16]. While this showed good results, semantic search proved limited for graph data, leading to a shift towards graph-based searches and knowledge graphs [00:08:00].

Requirement Retrieval

For requirements, early score enrichment of documents was used for faster retrieval [00:05:22]. This involved identifying important concepts within a large organization’s requirements, scoring documents based on these concepts, and thereby speeding up retrieval tasks [00:05:50].

An initial design for requirement understanding involved taking documents, pre-processing, splitting, and embedding them [00:08:30]. An extra step included using requirement templates with specific structures to extract relevant information, which was called “pods” [00:08:47]. This helped with fast retrieval and structuring business requirements for agents [00:09:17].

Key Learnings

  • Semantic grounding improves reasoning but has limitations and does not always scale well or provide sufficiently detailed responses [00:06:09].
  • Prompt design is critical for soft grounding, specifically in guiding the agent on what to focus on and retrieve [00:06:29].
  • Graph memory supports continuity, not just accuracy, by allowing agents to find and connect different nodes in a graph, providing more context for reasoning [00:06:47].
  • However, limitations appeared when increasing the number of requirement documents, leading to loss of context in larger searches, suggesting a potential role for graph analysis here too [00:09:37].

Complex Reasoning Scenarios with Multi-Agent Orchestration

Good architecture design involves conflicting goals, trade-offs, and debates [00:10:07]. AI agents need to collaborate, argue, and converge on justified recommendations [00:10:19].

Cat.io’s approach involves building a multi-agent orchestration system with role-specific agents [00:10:28].

  • Structured Message Format: Initially involved XMLs, now using structured messages which greatly improve workflow and enable multiple agents to work together in longer chains [00:10:53].
  • Conversation Management: Agent conversations are isolated to save tokens and prevent increased hallucination observed with larger memory [00:11:25].
  • Cloning of Agents: For parallel processing of certain tasks, agents are cloned, requiring memory management to duplicate relevant history while keeping new history separate [00:12:05].

Key Learnings

  • Structured outputs are crucial for improving clarity and control, and for better programming, despite potential concerns about reducing reasoning abilities [00:12:34].
  • Allowing agents to dynamically resolve trade-offs rather than executing static plans leads to higher creativity in planning and reaching results [00:13:08].
  • Successful multi-agent orchestration absolutely requires control flow; simply letting agents interact freely is insufficient [00:13:50].

AI Copilot System Example

Cat.io’s production system generates recommendations based on a multi-agent orchestration system [00:14:14]. Recommendations are categorized (e.g., architecture, messaging, API integration) and broken down into description, target state, gap analysis, and recommended actions [00:14:29].

The system consists of:

  • Chief Architect: Oversees and coordinates higher-level tasks [00:15:17].
  • Staff Architects: Ten specialized agents, each in a specific domain (e.g., infrastructure, API, IM) [00:15:23].
  • Retrievers:
    • Requirement Retriever: Accesses requirements data [00:15:36].
    • Architecture Retriever: Understands the current architecture state and can answer questions about components [00:15:45].

Multi-Agent Workflow

The workflow for generating recommendations involves three main sequential tasks [00:15:57]:

  1. List Generation: The Chief Architect requests recommendations from Staff Architects. Staff Architects parallelly reach out to the Architecture State Agent and Requirements Agent multiple times based on budgets [00:16:11]. Tens of calls can happen concurrently among multiple Staff Architects [00:17:54].
  2. Conflict Resolution: The Chief Architect reviews the generated list of recommendations to identify and prune conflicts or redundancies [00:16:20].
  3. Design Proposal: After conflict resolution, Staff Architects are tasked with writing full design proposals for each recommendation topic, including gap analysis and proposed architecture improvements [00:16:49]. During this step, each Staff Architect is cloned for each recommendation it needs to generate. Each clone has access to past history but maintains its own separate current history, allowing the agent to leverage existing knowledge for optimal design proposals [00:18:15].

Evaluation and Feedback

A significant challenge is determining the quality of a recommendation in a complex multi-agent system with numerous conversation rounds [00:19:03].

  • Human Evaluation: The best form of evaluation, especially in early stages [00:19:32]. LLM evaluations are useful but often don’t provide the depth needed for system improvements [00:19:42].
  • Internal Human Eval Tool (Eagle Eye): Built to allow examination of specific cases, including architecture, extracted requirements, agent conversations, and generated recommendations [00:19:55]. This tool facilitates relevance, visibility, and clarity studies, helping make decisions on future focus areas [00:20:22].

Key Learnings

  • Confidence is not correctness: An agent’s confidence doesn’t guarantee accuracy [00:20:38].
  • Human feedback is essential early on: Especially when building systems from scratch [00:20:51].
  • Evaluation must be baked into system design: Rather than being an afterthought, evaluation mechanisms (human eval tools, monitoring dashboards, LLM-based feedback loops) should be considered from the initial design phase of any new AI system [00:20:59].
  • Hallucinations can be detected and managed through careful monitoring of agent conversations and structured outputs [00:22:05].

Conclusion

Building an AI copilot is about designing a system that can reason, not just generate answers or provide assistance [00:23:00]. This involves handling large amounts of data, such as thousands or millions of architectural components and numerous documents, to answer varied questions from diverse stakeholders [00:23:20].

Developing such a reasoning system requires:

Extensive experimentation is needed to find patterns that work best with existing data, with graphs becoming increasingly important in design [00:24:16]. Experimentation also guides the nature of inter-agent interactions and the level of autonomy granted to each agent [00:24:46].

Cat.io has experimented with various frameworks for building multi-agents, settling on LangGraph for agent workflows, with a manager layer on top, and Slang for higher-level management [00:24:56]. Graphs are used to capture as much memory as possible, ensuring the AI always has the correct context per task [00:25:34]. The belief is that this approach represents the future of AI in software design [00:25:50].