Reasoning models and their prompting differences

Reasoning models operate and are prompted differently compared to other large language models (LLMs) [00:06:51].

Key Prompting Differences

When interacting with reasoning models, specific strategies can lead to better performance:

Encouraging More Reasoning

A crucial aspect of prompting reasoning models is to encourage more internal reasoning. Research has shown that the more a model reasons, the better its output can be [00:07:35].

For instance, studies with the Medprompt framework on GPT-4 found that prompting the model to “think more” resulted in better outcomes [00:07:46]. Similarly, during the training of DeepSeek’s R1 model, as the model’s reasoning length increased, so did its accuracy and performance [00:07:55].

Minimal Prompting and Clear Task Descriptions

For reasoning models, “minimal prompting” can be highly effective [00:08:09]. A simple, clear task description often yields good results [00:08:14].

Avoiding Few-Shot Prompting

Unlike other LLMs where few-shot prompting is often beneficial, it can degrade performance in reasoning models [00:07:11].

Microsoft’s Medprompt framework found that adding examples led to worse performance with GPT-4 [00:07:06]. Researchers at DeepSeek also observed that few-shot prompting degraded performance when building their R1 model [00:07:13]. OpenAI themselves cautioned that providing additional context can “over-complicate things and confuse the model” [00:07:24].

If examples are deemed necessary, it is recommended to start with only one or two [00:08:26].

Redundancy in Reasoning Instructions

Reasoning models often have Chain of Thought reasoning capabilities built-in [00:02:53]. Therefore, explicitly instructing the model on how to reason is generally unnecessary and can actually hurt performance [00:08:32].

Tubegraph

Explorer

Table of Contents