From: aidotengineer
Qwen is a series of large language models (LLMs) and large multimodal models (LMMs) with a vision of building a generalist model and agent [00:00:21]. The latest release, Qwen 3, represents significant progress in AI model capabilities [00:01:19].
Qwen 3 Models and Performance
Qwen 3 is the latest generation of large language models [00:03:19], featuring multiple sizes of dense and Mixture-of-Experts (MoE) models [00:03:26]. The development team believes that MoE models represent a future trend in AI [00:12:18].
Key Models
- Flagship MoE Model: A 235 billion parameter model that only activates 22 billion parameters, making it computationally efficient yet effective [00:03:33]. It achieves competitive performance compared to top-tier models like GPT-3.5 and is only slightly behind Gemini 2.5 Pro [00:03:52].
- Fast MoE Model: A relatively smaller model with a total of 30 billion parameters, activating only 3 billion parameters [00:04:10]. This model can even outperform the Qwen 32 billion dense model in some tasks [00:04:21].
- 4 Billion Parameter Model: A very small model that utilizes advanced distillation techniques to transfer knowledge from larger models [00:04:34]. Despite its size, it exhibits strong thinking capabilities and can be competitive with previous flagship models like Qwen 2.5 72B [00:04:52]. This model is capable of deployment on mobile devices [00:05:16].
- 32 Billion Parameter Dense Models: These models are strong and competitive, suitable for local deployment and reinforcement learning tasks [00:11:53].
Benchmarking
In multiple benchmarks, Qwen 2.5 Max, a precursor, achieved competitive performance against state-of-the-art models like ColossalChat 3.5, GPT-4, and DCB v3 [00:01:53]. The integration of reinforcement learning significantly boosts performance, especially in reasoning tasks such as math and coding, showing consistent improvement [00:02:17].
Core Features of Qwen 3
Hybrid Thinking Mode
A standout feature of Qwen 3 is its hybrid thinking mode, which allows a single model to exhibit both “thinking” and “non-thinking” behaviors [00:05:22].
- Thinking Mode: The model reflects on itself and explores possibilities before providing a detailed answer, similar to models like GPT-4 or DeepMind’s AlphaGo <a class=“yt=“yt-timestamp” data-t=“00:05:42”>[00:05:42].
- Non-Thinking Mode: Functions as a traditional, near-instant chatbot without the reflective delay [00:06:09].
This is presented as potentially the first time in the open-source community that these two modes are combined in a single model, controllable via prompts or hyperparameters [00:06:23].
Dynamic Thinking Budget
Within the hybrid thinking mode, Qwen 3 introduces a dynamic thinking budget, which refers to the maximum thinking tokens allowed [00:06:41]. Performance increases significantly with a larger thinking budget, especially in complex tasks like those in AM 24, where it can jump from 40% to over 80% accuracy with increased tokens [00:07:41]. This allows users to balance accuracy needs with token usage efficiency [00:08:21].
Multilingual Support
Qwen 3 dramatically expands its language capabilities, supporting over 119 languages and dialects, a significant increase from Qwen 2.5’s 29 languages [00:08:52]. This broad support aims to make large language models more accessible globally [00:09:17].
Enhanced Agent and Coding Capabilities
Qwen 3 shows specific improvements in agent and coding tasks, with enhanced support for mCP, which has recently gained popularity [00:09:39]. The model demonstrates the ability to use tools during its thinking process, making function calls, receiving environmental feedback, and continuing to think [00:09:56]. Examples include:
- Tool Use in Thinking: The model can integrate tool usage with its thinking capabilities, receiving feedback and continuously refining its process [00:10:00]. This is good for inference time scaling [00:10:24].
- Desktop Organization: The model can access file systems, think about which tools to use, execute them, and adapt based on feedback to complete tasks like organizing a desktop [00:10:29]. These advancements aim to transform the model from a simple chatbot into a productive agent in daily working life [00:11:04].
Multimodal Advancements
Qwen 2.5 VL
Qwen 2.5 VL, released in January, achieved very competitive performance in vision-language benchmarks, including understanding, math, and general VQA tasks [00:12:49]. The team also explored thinking capabilities for vision-language models, finding similar inverse time scaling with larger thinking lengths [00:13:16].
Qwen Omni Model
A significant step towards building a truly omni model is Qwen’s 7 billion parameter model, capable of accepting multiple modalities as input and generating multiple modalities as output [00:14:08].
- Inputs: Accepts text, vision (images and videos), and audio [00:14:18].
- Outputs: Capable of generating text and audio [00:14:31].
- Performance: Achieves state-of-the-art performance in audio tasks for its size and surprisingly better performance in vision-language understanding compared to Qwen 2.5 VL 7B [00:14:49]. Future goals include generating high-quality images and videos to become a truly omni model [00:14:33]. While there’s some performance drop in language and agent tasks, the team believes this can be recovered through data quality and training method improvements [00:15:20].
Open-Source Philosophy and Accessibility
The Qwen team is committed to open-sourcing its models, which fosters community feedback and encourages further development [00:15:54].
- Resources: Codes are available on GitHub [00:01:18], and checkpoints are on Hugging Face [00:01:22]. Technical details are shared via their blog [00:01:01].
- Model Variety: Offers many model sizes, from 0.6 billion to 235 billion parameters, catering to a wide range of users and applications [00:16:43].
- Quantized Models: Provides quantized models in various formats (GGUF, GGML, AWQ, MOX for Apple) to facilitate deployment [00:17:05].
- Licensing: Most models are released under Apache 2.0 license, allowing free use and modification for business purposes without requiring special permissions [00:17:13].
- Third-Party Support: Qwen models are supported by a wide array of third-party frameworks and API platforms due to their growing popularity [00:17:47].
Product Features on Qwen Chat
Qwen also develops products to allow users to interact with their models and agents [00:17:59].
- WebDev: Allows users to create websites or cards with simple prompts. For instance, inputting “create a Twitter website” generates the code and artifacts for deployment [00:18:13]. Users can deploy the sites and share URLs [00:18:37]. This feature enables quick creation of product introduction websites or visually appealing cards based on links [00:18:46]. This showcases advancements in AI generated assets and 3D modeling.
- Deep Research: Users can ask the model to write comprehensive reports on specific industries or topics [00:19:43]. The model plans, searches step-by-step, writes parts, and provides a downloadable PDF report [00:20:00]. The team is continuously improving its quality through reinforcement learning [00:20:32].
Future Developments and Roadmap
Qwen’s future efforts are geared towards achieving Artificial General Intelligence (AGI) and building robust foundation models and agents [00:21:01]. This represents a broader trend in AI technological advancements.
Core Areas of Focus
- Improved Training: Continued belief in significant room for improvement in training, including better data cleaning, integration of multimodal data, and exploration of new training methods beyond next-token prediction, possibly incorporating reinforcement learning in pre-training [00:21:17]. This relates to advancements in AI model technology and performance.
- Scaling Laws: Shifting focus on scaling compute in reinforcement learning for long-horizon reasoning with environmental feedback [00:22:12]. Models capable of interacting with the environment and continuously thinking will become smarter through inference time scaling [00:22:33].
- Context Scaling: Aiming to scale context window to at least 1 million tokens this year for most models, with a long-term goal of 10 million and eventually infinite context [00:22:56].
- Modality Scaling: Increasing model capabilities and productivity by scaling modalities, particularly in vision-language understanding for creating GUI agents and enabling computer use [00:23:25]. The goal is to unify understanding and generation, similar to models that generate high-quality images from text [00:24:05].
Training Agents
The overall strategic shift is from training models to training agents [00:24:36]. This involves scaling not only with pre-training but also with reinforcement learning, especially in interaction with the environment, ushering in an “era of agents” [00:24:44]. These advancements in AI and future implications point towards more autonomous and interactive AI systems.