From: aidotengineer
Immani, Co-founding Head of AI at Cat.io, discusses grounded reasoning systems for cloud architecture and the use of multi-agent orchestration to build an AI copilot [00:00:01].
The Need for Reasoning in Cloud Architecture
Cloud systems are growing in complexity due to increasing user and developer numbers, tools, constraints, and rising expectations [00:00:24]. Current synergistic tools do not scale across the diversity of decisions needed for cloud architecture [00:00:37]. Systems are required that can understand, debate, justify, and plan to solve these problems, moving beyond mere automation to true reasoning [00:00:44].
The architecture stack is not only technical but also cognitive [00:00:56]. Architects constantly negotiate tradeoffs based on requirement definition, time availability, and resources [00:01:04]. They rely on scattered and implicit context to make decisions, and capturing this for AI requires understanding their thought processes [00:01:21].
Challenges Where AI Meets Architecture
At a high level, Cat.io identifies three key challenges in solving architecture design problems with AI:
- Requirement Understanding: Determining the origin, format, important pieces, and scope (global vs. specific) of requirements [00:01:46].
- Architecture Identification: Understanding the functions of various components within an architecture, given that their roles can differ based on their placement and type [00:02:05].
- Architecture Recommendation: Providing recommendations to match requirements or improve the architecture based on best practices, given the current architecture state and understood requirements [00:02:27].
Technical Challenges: Semantic vs. Graph Context
To address these higher-level problems, specific AI-related challenges arise:
- Mixed Semantic and Graph data structures in AI and its benefits | Graphical Context: Requirements are often textual, while architecture is inherently graph data [00:02:56]. Integrating these diverse data sources for higher-level reasoning is crucial [00:03:10].
- Complex Reasoning Scenarios: Queries can be vague, broad, and require breakdown and proper planning to yield accurate answers [00:03:20].
- Evaluation and Feedback: Developing robust methods to evaluate and provide feedback to a large AI system with multiple moving parts [00:03:42].
Grounding Agents in a Specific Context
Large Language Models (LLMs) need proper context to reason effectively [00:04:14]. Translating natural language into meaningful architecture retrieval tasks, and doing so quickly, is challenging [00:04:22].
Cat.io experimented with approaches for both architecture and requirement retrieval:
- Semantic Enrichment of Architecture Data: Collecting relevant semantic information for each component to make it more searchable and findable in vector search [00:04:40].
- Graph data structures in AI and its benefits | Graph-enhanced Component Search: Utilizing graph algorithms to retrieve the correct information pieces when searching for specific components within an architecture [00:05:00].
- Early Score Enrichment of Requirement Documents: For faster retrieval, important concepts within requirements were identified and documents were scored, simplifying the initial retrieval task for a large corpus of text [00:05:22].
Learnings from Early Approaches
- Semantic grounding improves reasoning but has limitations, especially when scaling or requiring highly detailed responses [00:06:11].
- The design of the grounding mechanism is critical for soft grounding, specifically instructing the agent on what to focus on and retrieve [00:06:29].
- Graph memory supports continuity and not just accuracy, allowing connections between different nodes to provide additional context for reasoning [00:06:47].
An early design for architecture retrieval involved breaking down JSON architecture data into natural language, enriching it with connection data, embedding it, and storing it in a vector database [00:07:29]. While this showed good results, it was found that semantic search has limitations for graph data, leading to a shift towards graph-based searches and a knowledge graph approach [00:08:00].
An initial design for understanding requirements involved pre-processing, splitting, and embedding documents [00:08:30]. An extra step involved using requirement templates to structure extracted information into “pods,” ensuring relevance for downstream tasks [00:08:47]. This helped with fast retrieval and structuring business requirements [00:09:17]. However, limitations were observed with an increasing number of documents, leading to context loss in larger searches, suggesting that graph analysis could help [00:09:37].
Addressing Complex Reasoning Scenarios
Good architecture design inherently involves conflicting goals, trade-offs, and debates [00:10:07]. Agents need to be able to collaborate, argue, and converge on justified recommendations [00:10:19].
Cat.io’s approach included:
- Multi-agent orchestration with Role-Specific Agents: Building a system where multiple agents work together with distinct properties [00:10:28].
- Structured Message Format: Moving from XML to structured messages improved workflow building and enabled multiple agents to work together in longer chains [00:10:53].
- Context Management: Isolating conversations between agents to save tokens and prevent increased hallucination observed with larger memory [00:11:25].
- Cloning Agents for Parallel Processing: Duplicating agents for specific tasks to speed up processes, requiring careful memory management at the cloning point [00:12:05].
Learnings
- Structured outputs significantly improve clarity and control, which is crucial for better programming, despite potential trade-offs in reasoning abilities [00:13:00].
- Multi-agent systems allow agents to dynamically resolve trade-offs rather than executing static plans [00:13:08]. Making the orchestration more dynamic fosters higher creativity in planning and reaching results [00:13:35].
- Successful multi-agent orchestration requires control flow; simply letting agents work without guidance is not feasible [00:13:50].
Kato’s Multi-Agent Recommendation System
Cat.io’s production stack delivers recommendations based on a multi-agent system [00:14:14]. Recommendations are categorized (e.g., architecture messaging, API integration) and include a description, target state, gap analysis, and recommended actions [00:14:31].
The system comprises:
- Chief Architect: Oversees and coordinates higher-level tasks [00:15:17].
- 10 Staff Architects: Each specialized in a domain (e.g., infrastructure, API, IM) [00:15:23].
- Requirement Retriever: Accesses requirements data [00:15:36].
- Architecture Retriever: Understands the current architecture state and can answer questions about components [00:15:45].
The multi-agent workflow operates in three main sequential tasks:
- List Generation: The Chief Architect requests a list of possible recommendations from Staff Architects, who parallelly send requests to the Architecture State Agent and Requirements Agent. The Staff Architects then return possible recommendations to the Chief [00:16:11].
- Conflict Resolution: The Chief Architect reviews the generated list for conflicts or redundancies and prunes it [00:16:20].
- Design Proposal: For each recommendation topic, a full design proposal is written, detailing gap analysis and improvement proposals. During this step, each Staff Architect is cloned based on the number of recommendations they need to generate. Each clone has access to past history but generates its own isolated current history, ensuring comprehensive proposals [00:16:51].
Evaluation and Feedback
A key challenge is determining if a recommendation is good, especially with multiple agents and rounds of conversation [00:19:03]. The solution involves closing the loop with human scoring, structured feedback, and revision cycles [00:19:24].
Learnings
- Human evaluation is the most effective method, particularly in early stages of system development [00:19:32].
- LLM evaluations are helpful but do not provide the necessary insights for making critical improvements [00:19:42].
- Confidence displayed by the system does not equate to correctness [00:20:38].
- Evaluation mechanisms must be integrated into the system design from the outset, not added as an afterthought [00:20:56].
Cat.io developed an internal human evaluation tool called “Eagle Eye” [00:19:55]. This tool allows review of specific cases, extracted architecture and requirements, agent conversations, and generated recommendations to assess relevance, visibility, and clarity, and assign scores [00:20:06]. The tool has helped identify issues like hallucinations, where agents might make requests that are not relevant to their actual task [00:22:02].
Conclusion: Designing Reasoning Systems
Building an AI copilot is about designing a system that can reason, not just generate answers [00:23:06]. Such a system needs a comprehensive view of large datasets, like thousands or millions of architecture components and numerous documents, to answer varied questions from diverse stakeholders (developers to CTOs) [00:23:20].
Achieving this requires:
- Clearly defined roles [00:24:04].
- Structured workflows [00:24:04].
- Effective memory management [00:24:06].
- Structured interactions [00:24:06].
Experimentation is vital to discover patterns that work best with existing data [00:24:16]. Graphs are increasingly important in Cat.io’s designs [00:24:31]. They also explore the level of autonomy to grant each agent [00:24:46]. LangGraph is used for building agent workflows, with SLAM managing higher-level aspects [00:25:04]. Graphs are utilized to capture as much memory as possible, ensuring the AI maintains the right context for tasks [00:25:34].
The belief is that AI will fundamentally change how software is designed [00:25:50].