From: aidotengineer
The Evolution of Instruction Following in LLMs
In 2022, OpenAI released InstructGPT, a model praised for its ability to take and follow instructions effectively [00:00:00]. Initially, simple prompts like “Explain the moon landing to a six-year-old” demonstrated its capabilities [00:41:40]. However, as developers began to use language models for more complex tasks, prompts evolved to include extensive context, constraints, and requirements [00:51:00]. By 2025, even advanced models like GPT 4.1 still exhibited difficulties with instruction following, indicating that large language models (LLMs) alone are often insufficient for even seemingly simple tasks [00:24:24]. This challenge highlights the need for AI agents [01:21:00].
Defining an AI Agent
From an engineering perspective, the definition of an “agent” is pragmatic: “Whatever works, just make it work” [01:47:00]. Several architectural patterns and systems design are often referred to as AI agents:
- LLM as a Router: A routing model directs a query to a specialized LLM [01:53:00].
- Function Calling: The LLM is provided with a list of external tools to interact with APIs or the internet (e.g., Google search) [02:08:00].
- ReAct (Reason and Act): A widely used framework that allows any language model to think, act upon that thought, and observe the result in an iterative process. This process happens step-by-step without a look-ahead to the entire plan [02:39:00].
The Importance of Planning in Agentic Systems
For complex tasks that are not straightforward, planning becomes essential for AI agents [03:15:00]. Planning involves figuring out the sequence of steps needed to achieve a goal [03:20:00].
Key aspects of planning:
- Addressing Complex Tasks: Planning is crucial when tasks require parallelization and explainability [03:28:00]. Unlike ReAct, which shows thoughts but not the why, planning aims to provide clarity [03:36:00].
- Dynamic Planning (Replan): A plan is not static. A system can re-evaluate its plan midway through execution and adjust it if necessary [03:57:00].
- Smart Execution Engine: Every planner requires an execution engine to analyze dependencies between steps, enabling parallel execution [04:19:00]. It also manages trade-offs between speed and cost, potentially using techniques like branch prediction for faster systems [04:29:00].
Examples of planners include forms-based planners (e.g., text-based, like Microsoft’s Magentic) and code-based planners (e.g., from Hugging Face’s Small Agents) [03:43:00].
AI21 Labs’ Maestro: A Practical Application
AI21 Labs developed Maestro, a system that combines a planner and a smart execution engine to improve instruction following [04:40:00].
How Maestro Works:
- Separation of Concerns: Instead of shoving all information (context, task, requirements like paragraph limits, tone, brand mentions) into one prompt, Maestro separates them [04:53:00]. This makes validation easier [05:12:00].
- Execution Tree/Graph: At each step, the planner and execution engine evaluate several candidates, pursuing and improving only those that seem promising [05:15:00].
- Techniques for Improvement:
- Best of N: Sampling multiple generations from an LLM with high temperature, or using different LLMs [05:36:00].
- Candidate Pruning: Ditching non-promising candidates and pursuing only the best ones based on a predefined budget [05:49:00].
- Validation and Iteration: Continuous validation and iterative fixing [05:59:00].
- Decision Making: For more complex scenarios, the execution engine tracks expected cost, latency, and success probability, allowing the planner to choose the optimal path [06:07:00].
- Reduction: At the end, results are reduced by selecting the best one or combining them for a complete answer [06:28:00].
Performance and Trade-offs
Using a planner and smart execution engine significantly improves results for instruction following and requirement satisfaction, even with models like GPT-40 or Claude Sonnet 3.5 [06:42:00]. This comes at a cost of increased runtime and financial expenditure, but yields demonstrably higher quality outputs [07:10:00].
Key Takeaways for Agentic Frameworks
- LLMs alone are not always sufficient, even for “simple” tasks like instruction following [07:21:00].
- Always start simple: If an LLM alone suffices, use it [07:29:00].
- Progress to tools: If needed, integrate tools with your model [07:35:00].
- Consider ReAct: A good next step for structured thought and action [07:41:00].
- Implement planning and execution engines: For highly complex tasks, this approach is necessary to achieve desired results [07:43:00].