From: aidotengineer

Immani, co-founding head of AI at Cat.io, discusses how to build an AI copilot using multi-agent orchestration for grounded reasoning systems in cloud architecture [00:00:01].

The Need for Reasoning in Cloud Architecture

Cloud architecture requires reasoning, not just automation, due to its increasing complexity driven by users, developers, tools, and rising expectations [00:00:20]. Existing synergistic tools do not scale across the diverse decisions needed for cloud architecture [00:00:37]. Systems are needed that can understand, debate, justify, and plan to solve these complex problems, which goes beyond simple automation and into the realm of reasoning [00:00:44].

The architecture stack is not only technical but also highly cognitive [00:00:56]. Architects constantly negotiate tradeoffs based on defined requirements, available time, and resources [00:01:04]. They rely on scattered and implicit context for decisions, and for AI to capture this, it must understand how architects think [00:01:21].

Challenges at the Intersection of AI and Architecture

Cat.io addresses three high-level challenges in solving architecture design problems with AI:

  1. Requirement Understanding: Identifying the format, important pieces, and scope (global or specific) of requirements [00:01:46].
  2. Architecture Identification: Understanding the functions of various components within an architecture [00:02:05].
  3. Architecture Recommendation: Providing recommendations that match requirements or improve the existing architecture based on best practices [00:02:27].

These challenges translate into specific AI-related hurdles:

  • Mixing Semantic and Graph Context: Combining textual requirement documents with inherently graph-structured architecture data [00:02:56].
  • Complex Reasoning Scenarios: Handling vague, broad, and complex questions that require breakdown and proper planning for accurate answers [00:03:20].
  • Evaluation and Feedback: Developing methods to evaluate and provide feedback to large AI systems with many moving parts [00:03:42].

Grounding Agents in Specific Contexts

Effective AI systems require proper context for architecture reasoning [00:04:12]. Translating natural language into meaningful architecture retrieval tasks is difficult, especially when speed is critical [00:04:22].

Approaches explored:

  • Semantic Enrichment of Architecture Data: Collecting relevant semantic information for each component to make it more searchable in vector search [00:04:40].
  • Graph-Enhanced Component Search: Utilizing graph algorithms to retrieve the right information pieces from an architecture when searching for components [00:04:57].
  • Early Score Enrichment of Requirement Documents: Scoring documents based on important concepts for faster retrieval from large text corpora [00:05:22].

Learnings on Grounding Agents

  • Semantic grounding improves reasoning but doesn’t always work, sometimes falling short in scalability or providing overly detailed responses [00:06:09].
  • Right design is critical for “soft grounding,” telling the agent what to focus on and retrieve [00:06:29].
  • Graph memory supports continuity, not just accuracy, by connecting different nodes in searches and adding context for reasoning [00:06:47]. Semantic search has limitations for graph data, leading to a shift towards graph-based searches and knowledge graphs [00:08:00].

For requirement understanding, Cat.io initially used requirement templates to structure extracted information for downstream tasks, which helped with fast retrieval and structuring business requirements [00:08:47]. However, this approach had limitations, as context can be lost when increasing the number of documents, indicating a potential role for graph analysis here too [00:09:37].

Complex Reasoning Scenarios with Multi-Agent Systems

Good design involves conflicting goals, tradeoffs, and debates [00:10:07]. AI agents are needed that can collaborate, argue, and converge on justified recommendations [00:10:19].

Multi-Agent System Design

Cat.io built a multi-agent orchestration system with role-specific agents [00:10:28]. Key elements include:

  • Structured Message Format: Using structured messages (e.g., JSON, moving away from XML) helps build better workflows and enables multiple agents to work together in longer chains [00:10:53].
  • Conversation Management: Isolating conversations between agents prevents token waste and avoids “hostination” (a term for increased hallucination) observed with increased memory [00:11:25].
  • Agent Cloning for Parallel Processing: Cloning agents for specific tasks speeds up processes and requires careful memory management [00:12:05].

Learnings on Complex Reasoning

  • Structured outputs improve clarity and control, which is super important for better programming [00:13:34]. While there’s a tradeoff with reasoning abilities, the results are good [00:12:45].
  • Agents should resolve tradeoffs dynamically, not just execute static plans [00:13:10]. More dynamic orchestration leads to higher creativity in planning and reaching results [00:13:35].
  • Successful multi-agent orchestration requires control flow; simply letting agents work together without guidance is not sufficient [00:13:50].

Multi-Agent Workflow Example

Cat.io’s production system for architecture recommendations uses a multi-agent system [00:14:11]. The system includes:

  • Chief Architect: Oversees and coordinates higher-level tasks [00:15:17].
  • 10 Staff Architects: Each specialized in a domain (e.g., infrastructure, API, IAM) [00:15:23].
  • Requirement Retriever: Accesses requirements data [00:15:36].
  • Architecture Retriever: Understands the current architecture state and can answer questions about components [00:15:45].

The workflow consists of three main sequential tasks to generate recommendations [00:15:57]:

  1. List Generation: Staff architects, requested by the Chief Architect, send parallel requests to the Architecture State Agent and Requirements Agent. The results are gathered and sent back to the Chief Architect as a list of possible recommendations [00:16:11].
  2. Conflict Resolution: The Chief Architect reviews the generated list for conflicts or redundancies, pruning it [00:16:20].
  3. Design Proposal: For each recommendation topic, staff architects generate a full design proposal, including gap analysis and proposed improvements [00:16:49]. During this step, staff architects are cloned for each recommendation they need to generate, with each clone having access to past history and generating its own separated current history [00:18:15].

Evaluation and Feedback

Evaluating the quality of recommendations from a complex multi-agent system with many rounds of conversations is a challenge [00:19:03]. The loop needs to be closed with human scoring, structured feedback, and revision cycles [00:19:26].

Learnings on Evaluation

  • Human evaluation is currently the best method, as LLM evaluations, while good, don’t provide the necessary insights for improvement [00:19:34].
  • Cat.io developed an internal human evaluation tool called “Eagle Eye” to review specific cases, extracted architecture/requirements, agent conversations, and generated recommendations [00:19:55]. This tool allows for relevance, visibility, and clarity studies, helping focus future development [00:20:19].
  • Confidence is not correctness; while it can help, it cannot always be trusted [00:20:38].
  • Human feedback is essential early on when building systems from scratch [00:20:51].
  • Evaluation must be baked into system design from the start, not added later [00:20:59]. This means designing for evaluability, whether through human tools, monitoring dashboards, or LLM-based feedback loops [00:21:10].
  • The evaluation tool helps identify issues like hallucination, where an agent might generate irrelevant requests [00:22:02].

Conclusion and Future Outlook

Building a copilot is about designing a system that can reason, not just generate answers [00:23:06]. The goal is to build a system with a comprehensive view of large amounts of data (thousands or millions of architecture components, numerous documents) to answer diverse questions from various stakeholders, from developers to CTOs [00:23:16].

Achieving this requires defining roles, workflows, memories, and structure [00:24:04]. Extensive experimentation is crucial to discover patterns that work best with existing data [00:24:16]. Graphs are becoming increasingly important in Cat.io’s designs [00:24:31].

Cat.io is currently using LangChain’s LangGraph for building some agent workflows, managed by a higher-level manager, and using graphs to capture as much memory as possible to ensure the AI always has the right context per task [00:24:53]. The team believes this approach will fundamentally change how AI designs software [00:25:50].