From: aidotengineer
Hybrid thinking mode is a significant feature introduced in Quen 3, allowing a single model to utilize both “thinking” and “non-thinking” behaviors [00:05:31]. This capability combines two distinct approaches to AI model operation [00:05:39].
Components of Hybrid Thinking Mode
- Thinking Mode
- In this mode, before providing a detailed answer, the model engages in a process of reflection, exploring possibilities, and determining its readiness to answer [00:05:42].
- Models like 01 and Deep R1 exhibit this “thinking behavior” [00:06:04].
- Non-thinking Mode
- This is akin to a traditional instruction-tuned model or a chatbot [00:06:09].
- It provides answers without an explicit thinking process, resulting in near-instant responses [00:06:16].
Quen 3 is noted as potentially the first instance in the open-source community to combine these two modes into a single model [00:06:23]. Users can control the model’s behavior using prompts or hyperparameters [00:06:30].
Dynamic Thinking Budget
A key feature enabled by the hybrid thinking mode is the “dynamic thinking budget” [00:06:36]. This budget defines the maximum number of thinking tokens an AI model can use for a given task [00:06:46].
- Impact of Budget: If a task requires more thinking tokens than the allotted budget, the thinking process will be truncated at the budget limit [00:07:20].
- Performance: Performance generally increases with larger thinking budgets [00:07:45]. For example, in the AM 24 benchmark, a small thinking budget might yield just over 40% performance, while a large budget of 32,000 tokens can achieve over 80% [00:07:52].
- Optimization: Users can tailor the thinking budget to meet specific accuracy requirements, avoiding wasted tokens if a lower budget still achieves the desired outcome (e.g., 8,000 tokens for over 95% accuracy in a task) [00:08:21].
Application in Multimodal AI Systems
The concept of thinking capabilities and dynamic thinking budget has also been explored for vision language models like QVQ [00:13:18]. Similar to language-only models, a larger thinking budget in vision language models leads to better performance, especially in reasoning tasks such as mathematics [00:13:32].
Agent Capabilities
The ability to combine thinking with environmental interaction is a key aspect of developing productive AI agents [00:10:14]. Models with hybrid thinking mode can use tools, receive feedback from the environment, and continue thinking, which is beneficial for inference-time scaling [00:10:00]. This capability is crucial for models to evolve beyond simple chatbots into highly productive agents in various working life scenarios [00:11:06].