From: aidotengineer

AI agents are a significant area of discussion, with figures like Bill Gates, Andrew Ng, and Sam Altman highlighting their transformative potential in computing and AI progress [00:00:26]. While some critics view them as mere wrappers for large language models (LLMs) or question their practical solution capabilities, understanding their definition and components is key [00:00:43].

What Are AI Agents?

AI agents are not a new concept, but the advent of large language models has significantly enhanced their capabilities [00:01:08]. The core components of an AI agent are:

  • Perception Like humans, agents must understand their environment by sensing information from text, images, audio, video, and touch [00:01:20].
  • Reasoning After perceiving information, agents process it to understand how to complete tasks, break them into individual steps, and decide which tools or actions to take [00:01:37]. This inner planning often involves “Chain of Thoughts” reasoning, typically powered by LLMs [00:01:59].
  • Reflection (Meta-Reasoning) Agents can perform meta-reasoning steps, reflecting on executed actions to evaluate if the correct choice was made and adjust if necessary [00:02:10].
  • Actions Any performance an agent undertakes, such as talking to a human or moving from point A to point B, constitutes an action [00:02:25].

In essence, AI agents interact with their environment through the actuation of actions [00:02:41].

Levels of Autonomy for AI Agents

The deployment of AI agents can be understood through an analogy to the levels of autonomy in self-driving cars [00:03:02]:

  • Level 1: Chatbot (e.g., pre-2017 chatbots) Simply retrieves information [00:03:12].
  • Level 2: Agent Assist A human customer service agent uses an LLM to generate suggested responses but must approve sending messages [00:03:20].
  • Level 3: Agent as a Service LLMs automate AI workflows, offering services like booking meetings or writing job descriptions [00:03:35].
  • Level 4: Autonomous Agents A single agent can delegate and perform multiple, interconnected tasks that share components, knowledge, and resources [00:03:51].
  • Level 5: JARVIS (Iron Man Analogy) Full trust in agents, delegating all security measures and keys for agents to perform on behalf of the user [00:04:16].

While self-driving cars represent a high-risk agent application, AI agents can be applied to both low-risk (e.g., filing reimbursements with human supervision) and high-risk tasks (e.g., customer-facing interactions) [00:05:06]. Over time, the goal is to transition from back-office to front-office deployments [00:05:33].

Improving AI Agents

Strategies to improve LLMs for better reasoning and reflection in AI agent tasks include:

Self-Improvement Through Reflection

Self-improvement processes involve LLMs generating feedback on their own answers and then using that feedback to refine their responses [00:08:12]. This “self-refined” or “self-improvement” process can be iterated multiple times until a correct answer is reached [00:08:33].

However, smaller LLMs (e.g., Llama 7B) can generate “noise” in their feedback, leading to degraded results (“the blind is leading the blind”) [00:08:58]. Additionally, internal logic and demonstrations from larger models may be incompatible with smaller models, making their feedback unhelpful [00:09:52].

To address this, a method called WiGlass proposes:

  1. Using a smaller model to generate an initial answer and self-feedback [00:11:40].
  2. Employing a larger language model or external tool (like Python scripts for math tasks) to edit the smaller model’s feedback, making it more tailored to the smaller model’s internal logic [00:11:48].
  3. Using this corrected feedback to update the answer, iterating until the problem is solved correctly [00:11:55].

This process generates “traces” of trial and error that can be filtered and used to train smaller models for self-improvement, guided by larger models or tools [00:12:21]. This “on-policy self-supervision” has shown significant performance improvements in mathematical reasoning tasks (e.g., 48% accuracy after three iterations) [00:13:37].

Test-Time Scaling for Stronger Model Behavior

While pre-training LLMs is compute-intensive and often beyond smaller organizations’ budgets, “test-time scaling” offers an alternative [00:17:27]. This approach involves taking an existing model and providing it with more steps or budget during inference to elicit better results [00:17:31].

One method for this is Tree Search, specifically Monte Carlo Tree Search (MCTS), applied to LLMs.

  • Conversational Agents: In tasks like donation persuasion, MCTS allows an agent to simulate potential moves and opponent responses (e.g., “what’s my opponent going to do”) multiple steps ahead, similar to a chess game [00:19:44]. This “self-play” involves prompting one LLM to act as the policy (propose actions) and another to simulate user responses and evaluate action quality [00:22:03].

    • This approach, termed GPD-Zero (based on AlphaGo’s zero-training principle), does not require explicit training data and uses simulation to achieve competitive results [00:21:05]. Human studies showed models using this planning algorithm achieved higher donation rates and were perceived as more convincing and natural [00:24:49]. Agents self-discovered strategies like delaying the “big ask” and diversifying persuasion tactics [00:25:08].
  • Visual Web Agents: Expanding beyond conversational tasks, AI agents need to perceive visual information and perform actions like clicking buttons based on screenshots [00:27:21]. Traditional visual language models (VLMs) trained on Visual Question Answering (VQA) are insufficient for action-based tasks [00:27:51]. For instance, humans achieve 88% success on tasks like “clear my shopping carts” on Visual Web Arena, while a basic GPT-4V gets only 16% [00:28:02].

    • RL MCTS: An algorithm called RL MCTS (Reinforcement Learning Monte Carlo Tree Search) was introduced to improve decision-making in these environments [00:28:48]. It extends simple MCTS by incorporating:
      • Contrastive Reflection: Allows agents to learn from past interactions and dynamically improve search efficiency [00:29:11]. This involves a memory module that caches learned knowledge from previous tasks, allowing retrieval of similar reflections for future tasks [00:29:39].
      • Multi-Agent Debate: Uses a debate format to get more robust state evaluations, asking models to argue for why an action is good or bad to counteract biases from single-model prompting [00:29:21].
    • RL MCTS outperformed existing search algorithms and non-search methods on benchmarks like Visual Web Arena (browser tasks) and OS World (Linux computer tasks) [00:32:18]. This demonstrated that augmenting VLMs with search algorithms can improve performance without additional human supervision [00:32:46].

Exploratory Learning

Beyond test-time compute, the knowledge obtained through search processes can be transferred into the training process of the base LLM [00:33:30].

  • Unlike imitation learning (direct transfer of best actions), exploratory learning treats the tree search process as a single trajectory [00:33:45].
  • The model learns to linearize the search tree traversal, motivating it to learn how to explore, backtrack, and evaluate its own decisions [00:34:03]. This teaches the model the decision-making process itself, rather than just providing the final correct answer [00:34:16].

Future Directions for AI Agents

Current benchmarks often focus on a single agent performing a single task [00:37:25]. Future work needs to address more complex scenarios:

  • Multi-Tasking: How a single human can have a model perform multiple tasks on the same computer [00:37:40].
  • System-Level Problems: Scheduling, database interactions (to avoid side effects), and improved security, including human handover and supervision requests [00:37:48].
  • Multi-User, Multi-Agent Planning: When multiple humans interact with various agents, assigning different tasks, leading to more complicated planning scenarios [00:38:04].

Establishing more realistic benchmarks that integrate system considerations alongside algorithms will be crucial for developing applications that prioritize task completion, efficiency, and security [00:38:25].

For those interested in exploring these advancements, the ARCollex AI open-source agent framework offers features like continuous learning and task decomposition for developers [00:36:40].