Challenges with instruction following in AI

From: aidotengineer

Despite advancements, Language Models (LLMs) continue to face difficulties with instruction following, even years after models like InstructGPT were introduced specifically for this purpose [00:00:24]. This is not unique to one model; “every language model has some problems with following instructions” [00:00:29].

The Evolution of Instruction Complexity

In 2022, InstructGPT was lauded for its ability to follow instructions, exemplified by simple queries like “Explain the moon landing to a six-year-old” [00:00:37]. However, as more developers began using language models, instructions evolved into complex prompts packed with context, constraints, and requirements, often hoping for the best outcome from a single, lengthy prompt [00:01:02].

This shift revealed that for even seemingly “simple tasks such as instruction following,” LLMs alone are “no longer enough” [00:01:14].

AI Agents: A Solution to Instruction Following Challenges

To overcome these limitations, AI agents have emerged as a crucial component [00:01:21]. Agents go beyond simple prompting; they require “planning” [00:01:24].

While the definition of an “AI agent” can be a philosophical debate, for engineers, it refers to “whatever works” to achieve a goal [00:01:41]. This can include various approaches:

LLM as a Router: A routing model directs queries to specialized LLMs [00:01:55].
Function Calling: Providing an LLM with external tools and APIs (like Google search) to interact with the world [00:02:08].
ReAct (Reason and Act): A popular framework where any language model can follow a “thought, then act upon that thought, and observe” loop, step-by-step [00:02:36]. A key challenge with ReAct is that it doesn’t “look ahead to the entire plan” [00:03:00].

The Critical Role of Planning

Planning is defined as “figuring out the steps that you have to take to reach your goal” [00:03:20]. It becomes essential for:

“Complex tasks” [00:02:50]
Non-straightforward problems [00:02:26]
Tasks requiring parallelization [00:03:31]
Ensuring explainability, unlike ReAct which shows thoughts but not the “why” [00:03:33]

Planners can be forms-based (e.g., text-bl based like Microsoft’s Magentic One) or code-based (e.g., Small Agents from Hugging Face) [00:03:43].

Dynamic Planning and Smart Execution

Dynamic planning allows for “replan” mid-process, meaning the system can re-evaluate and change its course if the current plan isn’t optimal [00:03:57].

For efficiency, every planner needs an execution engine [00:04:19]. An execution engine can:

Analyze dependencies between steps, enabling “parallel execution” [00:04:24].
Manage trade-offs between speed and cost [00:04:29].

AI21 Maestro: An Example of Planning and Execution

AI21’s Maestro system combines a planner with a smart execution engine to address complex instruction following [00:04:40]. It separates context, task, and requirements, making validation easier [00:04:54].

In each step of the process, the planner and execution engine:

Choose several “candidates” [00:05:22].
Continue to refine and improve only the most “promising” ones [00:05:28].
Utilize techniques like “best of n” (sampling multiple generations from LLMs or different LLMs) [00:05:32].
Discard unpromising candidates and pursue the best ones based on a predefined budget [00:05:49].
Incorporate validation steps with iterations for fixing [00:05:59].

The execution engine also tracks expected cost, latency, and success probability, allowing the planner to choose the optimal path [00:06:15]. Finally, a “reduce” step combines or selects the best results for a complete answer [00:06:30].

Results

This approach significantly improves instruction following and requirement satisfaction, as demonstrated by high results for models like GPTO 40 and Claude Sonnet 3.5. While it may incur “more runtime and more money,” it yields “higher quality” [00:07:08].

Key Takeaways

LLMs alone are often insufficient for even “simple tasks such as just instruction following” [00:07:21].
It’s advisable to “start simple” with LLMs or tool-augmented models, like those using React [00:07:29].
For “complex” tasks, planning and execution engines are necessary for robust AI agents [00:07:43].

Tubegraph

Explorer

Table of Contents