From: aidotengineer

Even with the advancements of large language models (LLMs) like InstructGPT in 2022, which could follow instructions effectively [00:00:00], general language models continue to struggle with complex instructions and multiple requirements [00:00:24]. This difficulty arises when developers combine all information, context, constraints, and requirements into a single prompt [00:01:04]. For even seemingly simple tasks like instruction following, LLMs alone are often insufficient [00:01:14]. This is where AI agents become crucial [00:01:21].

What is an AI Agent?

The precise definition of an AI agent can be debated [00:01:41]. However, from an engineering perspective, what matters is functionality [00:01:45]. Various approaches are sometimes referred to as agents, including:

  • LLM as a Router A routing model directs queries to specialized LLMs [00:01:55].
  • Function Calling LLMs are provided with external tools and APIs, allowing them to interact with the world, such as through Google search [00:02:08].
  • React (Reason and Act) This framework, usable with any language model, follows a “thought then act upon that thought and observe” loop. It processes tasks step-by-step without an overall look-ahead plan [00:02:39].

While these methods offer various capabilities, effective AI agents for complex tasks go beyond mere prompting and require planning [00:01:24].

The Necessity of Planning in AI Agents

Planning in the context of AI agents involves figuring out the sequence of steps needed to achieve a goal [00:03:20]. It becomes essential for:

  • Complex tasks: When problems are not straightforward [00:02:26].
  • Parallelization: Enabling concurrent execution of steps [00:03:31].
  • Explainability: Unlike React, where the flow of thought is visible but the overall reasoning unclear, planning allows for a better understanding of why certain steps were taken [00:03:36].

Planners can be categorized as forms-based (e.g., text-based, like Microsoft’s Magentic) or code-based (e.g., Hugging Face’s Small Agents) [00:03:43].

Dynamic Planning and Smart Execution

Dynamic planning involves the ability to replan during execution [00:03:57]. Instead of adhering to a single, rigid plan, an agent can reassess and adjust its strategy mid-way [00:04:06].

For efficiency, every planner needs an execution engine [00:04:19]. An execution engine’s capabilities include:

  • Dependency Analysis: Analyzing dependencies between steps [00:04:24].
  • Parallel Execution: Enabling concurrent processing of tasks [00:04:26].
  • Trade-off Management: Balancing speed and cost, potentially using techniques like branch prediction for faster systems [00:04:29].

AI21 Maestro: A System for Planning and Execution

AI21 Maestro exemplifies a system that combines a planner and a smart execution engine [00:04:40].

Simplified Instruction Following

In a simplified scenario for instruction following, a prompt contains context, task details, and requirements (e.g., paragraph limits, tone, brand mentions). These requirements are separated from the main prompt for easier validation [00:04:53].

The system uses an execution tree or graph [00:05:18]:

  1. The planner and execution engine select several candidate solutions at each step [00:05:23].
  2. Only promising candidates are pursued for further refinement [00:05:28].

Techniques used include:

  • Best of N: Sampling multiple generations from an LLM (potentially with high temperature or from different LLMs) instead of just one [00:05:36].
  • Candidate Discarding: Discarding unpromising candidates and focusing on the best ones based on a predefined budget [00:05:49].
  • Validation and Iteration: Continuous validation and iterative fixing [00:05:59].

Complex Task Execution

For more complex tasks, the system’s input leads to various “tracks” [00:06:07]. The execution engine estimates the expected cost, latency, and success probability for each track [00:06:15]. The planner then chooses the optimal path [00:06:22]. Ultimately, results are reduced or combined to form a complete answer to the original query [00:06:30].

Using a planner and smart execution engine significantly improves results for instruction following and requirement satisfaction, even for models like GPT-40 or Claude Sonnet 3.5 [00:06:42]. While this approach may incur higher runtime and cost, it delivers significantly higher quality outcomes [00:07:10].

Key Takeaways

  • LLMs Alone Are Insufficient: Even for “simple” instruction following, LLMs by themselves are often not enough [00:07:21].
  • Start Simple: Begin with the simplest solution that works. Use raw LLMs if they suffice [00:07:30].
  • Leverage Tools: Integrate tools or use frameworks like React when basic LLMs aren’t enough [00:07:35].
  • Adopt Planning for Complexity: For truly complex tasks, planning combined with an execution engine is necessary [00:07:43].