Challenges in instruction following by language models

From: aidotengineer

Despite early advancements, language models (LLMs) continue to face difficulties in consistently following instructions, even for what appear to be simple tasks 00:00:24.

The Evolution of the Problem

In 2022, OpenAI released InstructGPT, which was a significant step forward as the first model capable of taking and following instructions effectively 00:00:00. Early demonstrations, such as explaining the moon landing to a six-year-old, impressed many 00:00:37.

However, as developers began to integrate LLMs into more complex applications, the nature of instructions evolved 00:00:51. Prompts grew to encompass extensive information, context, constraints, and requirements, all typically crammed into a single input 00:01:02. This increased complexity highlighted the limitations of standalone LLMs for instruction following 00:01:14.

The Role of AI Agents

The need to overcome these challenges has led to the development and increased importance of AI agents 00:01:21. Unlike simple prompting, agents require more sophisticated approaches, particularly planning 00:01:24.

Various interpretations and implementations of AI agents exist:

LLM as a Router Some systems route queries to specialized LLMs based on an initial routing model 00:01:53.
Function Calling This involves providing an LLM with a list of external tools it can use to interact with APIs, the internet (e.g., Google search), or other systems 00:02:08.
React Framework A popular framework that allows any language model to “reason in an action” by thinking, acting upon that thought, and observing the results iteratively 00:02:39. However, React processes steps one at a time without an overall look-ahead plan 00:02:58.

The Necessity of Planning

For complex tasks that are not straightforward, AI agents need planning 00:03:15. Planning involves figuring out all the necessary steps to reach a goal, enabling parallelization and improving explainability, which is often lacking in reactive systems like React 00:03:20.

Planning can take various forms:

Text-based planners such as Text BL-based systems like Microsoft’s Magentic One 00:03:43.
Code-based planners utilized by agents like Small Agents from Hugging Face 00:03:50.

Dynamic Planning and Smart Execution

A key aspect of advanced planning is dynamic planning, which allows for replanning mid-process if the initial plan is no longer optimal 00:03:57.

For efficiency, every planner requires an execution engine 00:04:14>. An execution engine is crucial because it can:

Analyze dependencies between steps, facilitating parallel execution 00:04:22.
Manage trade-offs between speed and cost, potentially using techniques like branch prediction for faster systems 00:04:29.

AI21 Mastro: A Case Study

AI21 Mastro is an example of a system that integrates both a planner and a smart execution engine to tackle instruction following 00:04:40.

In this system:

The prompt, context, task, and requirements are separated, making validation easier 00:05:53.
At each step, the planner and execution engine evaluate several candidate solutions, pursuing only the most promising ones for refinement 00:05:15.
Techniques used include:
- Best-of-N: Sampling multiple generations from an LLM with high temperature or using different LLMs 00:05:32.
- Candidate Discarding: Eliminating unpromising candidates early to focus resources on the best options within a predefined budget 00:05:49.
- Validation and Iteration: Continuously validating results and iteratively fixing them 00:05:59.

The execution engine can forecast expected cost, latency, and success probability for different paths, allowing the planner to choose the optimal route 00:06:15. Finally, a “reduce” step combines or selects the best results for a complete answer 00:06:30.

Results from AI21’s internal customer data show that integrating a planner and smart execution engine significantly improves performance (e.g., for GPTO 40, Cloud Sonet 3.5, or 3 Mini) and requirement satisfaction compared to single LLM calls 00:06:42. While this approach may incur higher runtime and cost, it delivers substantially higher quality 00:07:10.

Conclusion

LLMs alone are often insufficient for instruction following, especially with complex tasks 00:07:21. For simpler scenarios, direct LLM calls, tool integration, or the React framework may suffice 00:07:30. However, for highly complex tasks, adopting a planning and execution engine framework becomes essential to achieve reliable and high-quality instruction adherence 00:07:43.

Tubegraph

Explorer

Table of Contents