From: aidotengineer
In the evolution of AI, particularly with large language models (LLMs), a key challenge has been the ability to consistently follow instructions. While early models like InstructGPT in 2022 were hailed for their instruction-following capabilities [00:00:03], even by 2025, issues persist across many language models [00:00:26]. This highlights a fundamental distinction between simple prompting and more sophisticated planning mechanisms required for complex tasks.
The Limitations of Direct Prompting
Initially, users were impressed by models’ ability to handle straightforward requests, such as “Explain the moon landing to a six-year-old” [00:00:41]. However, as developers began to leverage LLMs for more intricate applications, instructions evolved to include extensive information, context, constraints, and requirements, often all “shoved in one prompt” [00:01:02].
For even seemingly “simple tasks” like instruction following, relying on LLMs alone is often insufficient [00:01:14]. This limitation leads to the necessity of AI agents and planning.
The Rise of AI Agents and Planning
AI agents go beyond mere prompting by incorporating planning [00:01:21]. While the philosophical definition of an “agent” can be debated, from an engineering perspective, it refers to whatever mechanism effectively achieves a desired outcome [00:01:45].
Common interpretations or components that contribute to agentic behavior include:
- LLMs as Routers: A routing model directs queries to specialized LLMs [00:01:53].
- Function Calling: LLMs are provided with external tools and APIs (e.g., Google search) to interact with the world [00:02:08].
- React (Reason + Act): A popular framework where an LLM thinks, then acts upon that thought, observes the result, and repeats the process [00:02:39]. While effective, React processes steps one at a time and lacks a look-ahead to the entire plan [00:02:58].
What is Planning?
Planning in AI involves “figuring out the steps that you have to take to reach your goal” [00:03:20]. It is crucial for complex tasks that are not straightforward [00:03:26] and require:
- Parallelization: Performing multiple steps simultaneously [00:03:31].
- Explainability: Understanding why certain actions were taken, unlike React where you see thoughts but not the underlying rationale for the overall sequence [00:03:36].
Planning can be implemented through form-based planners (e.g., text-based) or code-based planners [00:03:43].
Dynamic Planning and Smart Execution
A key aspect of advanced planning is dynamic planning, which allows for replanning mid-process [00:03:57]. Instead of rigidly following a single plan, the system can evaluate if the current plan is optimal or if a new one is needed [00:04:09].
For efficiency, a planner needs an execution engine [00:04:19]. This engine can:
- Analyze dependencies between steps to enable parallel execution [00:04:22].
- Manage trade-offs between speed and cost [00:04:29].
- Utilize techniques like branch prediction for faster systems [00:04:34].
AI21 Maestro: An Example of Planning and Execution
AI21’s Maestro system exemplifies the combination of a planner and a smart execution engine [00:04:40]. For instruction following, it separates the core prompt (context and task) from specific requirements (e.g., paragraph limit, tone, brand mentions) [00:04:53]. This separation makes requirements easier to validate [00:05:12].
In each step, the planner and execution engine evaluate several candidate solutions, continuing only with those that appear promising [00:05:20]. Strategies for effective implementation include:
- Best of N: Sampling multiple generations from an LLM (potentially with high temperature) or using different LLMs [00:05:36].
- Candidate Discarding: Discarding unpromising candidates and pursuing only the best ones based on a predefined budget [00:05:49].
- Validation and Iteration: Iteratively fixing and improving solutions [00:05:59].
The execution engine in such systems can track expected cost, latency, and success probability, allowing the planner to choose the most appropriate path [00:06:15]. At the end, results are reduced or combined to form a complete answer [00:06:28]. This approach significantly improves quality, even for models like GPT-4, Claude Sonnet, or Mini, albeit with increased runtime and cost [00:06:42].
Conclusion
While direct prompting with LLMs can suffice for simple tasks, more complex scenarios or stringent instruction following necessitate advanced methods. The key takeaway is that LLMs alone are not always enough [00:07:21].
For optimal AI implementation, consider a progressive approach:
- Start Simple: If LLMs alone or LLMs with basic tools (like function calling) work, use them [00:07:30].
- Employ React: For tasks requiring iterative thought and action, the React framework can be beneficial [00:07:41].
- Advanced Planning and Execution: For truly complex tasks, a full planning and execution engine that allows for dynamic planning and smart execution is the most robust solution [00:07:43]. This represents a significant shift from simple prompting to a multi-step, intelligent problem-solving approach in AI.