From: aidotengineer
Immani, co-founding head of AI at Cat.io, discusses how to build an AI copilot using multi-agent orchestration for grounded reasoning systems in cloud architecture [00:00:01].
The Need for Reasoning in Cloud Architecture
Cloud architecture requires reasoning, not just automation, due to its increasing complexity driven by users, developers, tools, and rising expectations [00:00:20]. Existing synergistic tools do not scale across the diverse decisions needed for cloud architecture [00:00:37]. Systems are needed that can understand, debate, justify, and plan to solve these complex problems, which goes beyond simple automation and into the realm of reasoning [00:00:44].
The architecture stack is not only technical but also highly cognitive [00:00:56]. Architects constantly negotiate tradeoffs based on defined requirements, available time, and resources [00:01:04]. They rely on scattered and implicit context for decisions, and for AI to capture this, it must understand how architects think [00:01:21].
Challenges at the Intersection of AI and Architecture
Cat.io addresses three high-level challenges in solving architecture design problems with AI:
- Requirement Understanding: Identifying the format, important pieces, and scope (global or specific) of requirements [00:01:46].
- Architecture Identification: Understanding the functions of various components within an architecture [00:02:05].
- Architecture Recommendation: Providing recommendations that match requirements or improve the existing architecture based on best practices [00:02:27].
These challenges translate into specific AI-related hurdles:
- Mixing Semantic and Graph Context: Combining textual requirement documents with inherently graph-structured architecture data [00:02:56].
- Complex Reasoning Scenarios: Handling vague, broad, and complex questions that require breakdown and proper planning for accurate answers [00:03:20].
- Evaluation and Feedback: Developing methods to evaluate and provide feedback to large AI systems with many moving parts [00:03:42].
Grounding Agents in Specific Contexts
Effective AI systems require proper context for architecture reasoning [00:04:12]. Translating natural language into meaningful architecture retrieval tasks is difficult, especially when speed is critical [00:04:22].
Approaches explored:
- Semantic Enrichment of Architecture Data: Collecting relevant semantic information for each component to make it more searchable in vector search [00:04:40].
- Graph-Enhanced Component Search: Utilizing graph algorithms to retrieve the right information pieces from an architecture when searching for components [00:04:57].
- Early Score Enrichment of Requirement Documents: Scoring documents based on important concepts for faster retrieval from large text corpora [00:05:22].
Learnings on Grounding Agents
- Semantic grounding improves reasoning but doesn’t always work, sometimes falling short in scalability or providing overly detailed responses [00:06:09].
- Right design is critical for “soft grounding,” telling the agent what to focus on and retrieve [00:06:29].
- Graph memory supports continuity, not just accuracy, by connecting different nodes in searches and adding context for reasoning [00:06:47]. Semantic search has limitations for graph data, leading to a shift towards graph-based searches and knowledge graphs [00:08:00].
For requirement understanding, Cat.io initially used requirement templates to structure extracted information for downstream tasks, which helped with fast retrieval and structuring business requirements [00:08:47]. However, this approach had limitations, as context can be lost when increasing the number of documents, indicating a potential role for graph analysis here too [00:09:37].
Complex Reasoning Scenarios with Multi-Agent Systems
Good design involves conflicting goals, tradeoffs, and debates [00:10:07]. AI agents are needed that can collaborate, argue, and converge on justified recommendations [00:10:19].
Multi-Agent System Design
Cat.io built a multi-agent orchestration system with role-specific agents [00:10:28]. Key elements include:
- Structured Message Format: Using structured messages (e.g., JSON, moving away from XML) helps build better workflows and enables multiple agents to work together in longer chains [00:10:53].
- Conversation Management: Isolating conversations between agents prevents token waste and avoids “hostination” (a term for increased hallucination) observed with increased memory [00:11:25].
- Agent Cloning for Parallel Processing: Cloning agents for specific tasks speeds up processes and requires careful memory management [00:12:05].
Learnings on Complex Reasoning
- Structured outputs improve clarity and control, which is super important for better programming [00:13:34]. While there’s a tradeoff with reasoning abilities, the results are good [00:12:45].
- Agents should resolve tradeoffs dynamically, not just execute static plans [00:13:10]. More dynamic orchestration leads to higher creativity in planning and reaching results [00:13:35].
- Successful multi-agent orchestration requires control flow; simply letting agents work together without guidance is not sufficient [00:13:50].
Multi-Agent Workflow Example
Cat.io’s production system for architecture recommendations uses a multi-agent system [00:14:11]. The system includes:
- Chief Architect: Oversees and coordinates higher-level tasks [00:15:17].
- 10 Staff Architects: Each specialized in a domain (e.g., infrastructure, API, IAM) [00:15:23].
- Requirement Retriever: Accesses requirements data [00:15:36].
- Architecture Retriever: Understands the current architecture state and can answer questions about components [00:15:45].
The workflow consists of three main sequential tasks to generate recommendations [00:15:57]:
- List Generation: Staff architects, requested by the Chief Architect, send parallel requests to the Architecture State Agent and Requirements Agent. The results are gathered and sent back to the Chief Architect as a list of possible recommendations [00:16:11].
- Conflict Resolution: The Chief Architect reviews the generated list for conflicts or redundancies, pruning it [00:16:20].
- Design Proposal: For each recommendation topic, staff architects generate a full design proposal, including gap analysis and proposed improvements [00:16:49]. During this step, staff architects are cloned for each recommendation they need to generate, with each clone having access to past history and generating its own separated current history [00:18:15].
Evaluation and Feedback
Evaluating the quality of recommendations from a complex multi-agent system with many rounds of conversations is a challenge [00:19:03]. The loop needs to be closed with human scoring, structured feedback, and revision cycles [00:19:26].
Learnings on Evaluation
- Human evaluation is currently the best method, as LLM evaluations, while good, don’t provide the necessary insights for improvement [00:19:34].
- Cat.io developed an internal human evaluation tool called “Eagle Eye” to review specific cases, extracted architecture/requirements, agent conversations, and generated recommendations [00:19:55]. This tool allows for relevance, visibility, and clarity studies, helping focus future development [00:20:19].
- Confidence is not correctness; while it can help, it cannot always be trusted [00:20:38].
- Human feedback is essential early on when building systems from scratch [00:20:51].
- Evaluation must be baked into system design from the start, not added later [00:20:59]. This means designing for evaluability, whether through human tools, monitoring dashboards, or LLM-based feedback loops [00:21:10].
- The evaluation tool helps identify issues like hallucination, where an agent might generate irrelevant requests [00:22:02].
Conclusion and Future Outlook
Building a copilot is about designing a system that can reason, not just generate answers [00:23:06]. The goal is to build a system with a comprehensive view of large amounts of data (thousands or millions of architecture components, numerous documents) to answer diverse questions from various stakeholders, from developers to CTOs [00:23:16].
Achieving this requires defining roles, workflows, memories, and structure [00:24:04]. Extensive experimentation is crucial to discover patterns that work best with existing data [00:24:16]. Graphs are becoming increasingly important in Cat.io’s designs [00:24:31].
Cat.io is currently using LangChain’s LangGraph for building some agent workflows, managed by a higher-level manager, and using graphs to capture as much memory as possible to ensure the AI always has the right context per task [00:24:53]. The team believes this approach will fundamentally change how AI designs software [00:25:50].