From: aidotengineer
The landscape of AI models has evolved significantly since OpenAI’s release of InstructGPT in 2022, a model lauded for its ability to follow instructions effectively [00:00:00]. However, even with advanced models like GPT 4.1 in 2025, challenges persist in reliably following complex instructions [00:00:19]. This limitation extends to nearly every language model [00:00:29].
Initially, simple instructions like “Explain the moon landing to a six-year-old” demonstrated the power of early LLMs [00:00:41]. However, as developers began integrating language models into more sophisticated applications, prompts grew in complexity, often attempting to cram all context, constraints, and requirements into a single input [00:00:51]. For even seemingly simple instruction following, LLMs alone are no longer sufficient [00:01:11].
Role of AI Agents in Planning and Executing Tasks
This is where AI agents become crucial [00:01:21]. Agents go beyond mere prompting and necessitate planning [00:01:24].
While the philosophical definition of an “agent” in AI is debated [00:01:41], from an engineering perspective, it refers to any system that works to achieve a goal [00:01:45]. Various interpretations of agents exist:
- LLM as a Router A routing model that directs queries to specialized LLMs [00:01:55].
- Function Calling Providing an LLM with external tools and APIs (e.g., Google Search) it can use to interact with the world [00:02:08]. This approach has been standardized by MCEP [00:02:24].
- ReAct (Reason and Act) A popular framework where a language model thinks, acts upon that thought, and observes the result in a step-by-step manner [00:02:39]. However, ReAct does not involve a “look ahead” to the entire plan, focusing on the next immediate step [00:03:00].
Task Planning and Decision-Making in AI Agents
Planning in AI agents involves figuring out the necessary steps to reach a specific goal [00:03:15]. It is particularly useful for complex tasks that are not straightforward [00:03:24], especially those requiring parallelization and explainability [00:03:31]. Unlike ReAct, where the thought process is visible but the why of the overall progression isn’t clear, planning aims for greater transparency [00:03:36].
Planners can be forms-based (like TextBL-based planners such as Magentic by Microsoft) or code-based (like Small Agents from Hugging Face) [00:03:43].
Dynamic Planning
Dynamic planning refers to the ability to re-plan in the middle of a task [00:03:56]. Instead of rigidly adhering to one initial plan, a dynamic planner can evaluate its progress and decide to re-plan if a better path emerges [00:04:00].
Execution Engines and Smart Execution
For efficiency, every planner requires an execution engine [00:04:14]. An execution engine plays a vital role by:
- Analyzing dependencies between steps, enabling parallel execution [00:04:22].
- Managing trade-offs between speed and cost [00:04:29], potentially using techniques like branch prediction for faster systems [00:04:34].
AI21 Maestro: An Example of Planning and Smart Execution
AI21’s Maestro system exemplifies the integration of a planner and a smart execution engine [00:04:40]. For instruction following, Maestro separates the prompt’s context, task, and requirements (e.g., paragraph limits, tone, brand mentions) [00:04:53]. This separation simplifies validation [00:05:12].
The system operates using an execution tree or graph [00:05:15]:
- At each step, the planner and execution engine select several candidate solutions [00:05:20].
- Only promising candidates are pursued, fixed, and improved [00:05:26].
- Techniques like “best of N” are used, where multiple generations from an LLM (potentially different LLMs) are sampled with high temperature [00:05:32].
- Unpromising candidates are discarded, and only the best ones are pursued based on a predefined budget [00:05:49].
- Validation and iterative refinement (fixing) are integral parts of the process [00:05:59].
For more complex scenarios, the execution engine can estimate expected cost, latency, and success probability for different execution tracks, allowing the planner to choose the optimal path [00:06:07]. Finally, a “reduce” step combines or selects the best results for a complete answer [00:06:28].
Results and Trade-offs
Using this planning and smart execution approach, systems like Maestro demonstrate significantly higher results for various LLMs (e.g., GPT-40, Claude Sonnet 3.5, 3 Mini) [00:06:42]. For requirement satisfaction, using an internal customer-data-based dataset, this method shows clear improvement over single LLM calls [00:07:01]. This comes with a trade-off: increased runtime and cost for higher quality [00:07:10].
Conclusion
The key takeaways for developing and optimizing AI agents are:
- LLMs alone are often insufficient, even for tasks like instruction following [00:07:21].
- Start simple: if LLMs suffice, use them. If tools are needed, use function calling. If more structured reasoning is required, consider ReAct [00:07:27].
- For complex tasks that demand advanced capabilities, a planning and execution engine is essential to achieve high quality and reliability [00:07:42].