From: aidotengineer
While Large Language Models (LLMs) like InstructGPT marked a significant leap in language understanding and instruction following in 2022, they still struggle with complex instructions, even by 2025 with models like GPT-4.1 [00:00:00]. This limitation stems from users attempting to include all context, constraints, and requirements into a single prompt, which often proves insufficient even for seemingly simple tasks like instruction following [00:01:04]. This is where AI agents come into play [00:01:21].
Agents do not solely rely on prompting; they incorporate planning [00:01:24].
Defining an AI Agent
From an engineering perspective, the precise definition of an AI agent is less critical than its functional effectiveness [00:01:45]. Several concepts are often referred to as agents or agentic workflows:
- LLM as a Router: This involves a routing model that directs a query to a specialized LLM [00:01:55].
- Function Calling: LLMs are provided with a list of external tools to interact with APIs, the internet (like Google search), or other systems [00:02:08]. The MCP (Microsoft’s framework for multi-agent collaboration and planning) standardizes this concept [00:02:24].
- React (Reason and Act): A popular agentic framework that works with any language model. It involves a cycle of “thought, then act upon that thought, and observe” [00:02:39]. While effective, React processes steps one at a time, without a look-ahead to the entire plan [00:02:58].
The Necessity of Planning in Agentic Workflows
Every AI agent ultimately needs to incorporate planning [00:03:15]. Planning is the process of figuring out the steps required to achieve a specific goal [00:03:20]. It becomes essential for complex tasks that are not straightforward and require parallelization and explainability [00:03:24]. This contrasts with React, where the sequence of thoughts is visible, but the overall “why” behind the progression is not clear [00:03:36].
Planning can be implemented using:
- Forms-based planners: Such as TextBL or Magentic One by Microsoft [00:03:43].
- Code-based planners: Examples include small agents from Hugging Face [00:03:50].
Dynamic Planning and Smart Execution
A key aspect of effective planning is dynamic planning, which allows for replanning in the middle of a task [00:03:57]. This enables the system to reassess if the current plan is optimal or if a new path should be taken [00:04:09].
For efficiency, agents integrate a “smart execution engine” alongside the planner [00:04:17]. An execution engine can:
- Analyze dependencies between steps, enabling parallel execution [00:04:24].
- Manage trade-offs between speed and cost, for instance, by using branch prediction for faster systems [00:04:29].
AI21 Maestro: An Agentic Framework Example
AI21 Labs has developed an agentic framework called AI21 Maestro, which combines a planner and a smart execution engine [00:01:32] [00:04:40].
In a simplified instruction-following task, Maestro separates the prompt (context and task) from explicit requirements (e.g., length, tone, brand mentions) [00:04:53]. This separation makes validation easier [00:05:12].
The process involves an execution tree or graph:
- At each step, the planner and execution engine select several candidate solutions [00:05:15].
- Only the most promising candidates are pursued, fixed, and improved [00:05:26].
Techniques used within the execution engine include:
- Best of N: Sampling multiple generations from an LLM (often with high temperature) or using different LLMs [00:05:36].
- Candidate Discarding: Ditching unpromising candidates early and focusing on the best ones based on a predefined budget [00:05:49].
- Validation and Iteration: Iteratively fixing and refining outputs [00:05:59].
The execution engine can also track expected cost, latency, and success probability, allowing the planner to choose the most appropriate path [00:06:15]. Finally, a “reduce” step combines or selects the best results for a complete answer [00:06:30].
Benefits and Challenges
AI agents employing planning and smart execution engines demonstrate significant improvements in instruction following and requirement satisfaction compared to single LLM calls [00:06:42] [00:07:02]. While this approach leads to higher quality, it comes at the cost of increased runtime and financial expenditure [00:07:10].
Conclusion
LLMs alone are not always sufficient, even for basic tasks like instruction following [00:07:21]. The approach to building effective AI agents should be incremental:
- Start simple with single LLM calls if they suffice [00:07:30].
- Incorporate tools or React if the task complexity increases [00:07:35].
- For highly complex tasks, adopting a planning and execution engine framework is necessary to achieve desired quality and performance [00:07:43].