From: aidotengineer
Qwen is a series of large language models and large multimodal models aiming to build a generalist model and agent [00:00:21]. The Qwen team emphasizes open sourcing and applying their models to real-world tasks and development.
Accessing Qwen Models and Resources
Users can interact with Qwen’s latest models through various platforms:
- Qwen Chat: A chat interface at chat.qwen.ai that is easy to use [00:00:39]. It supports interaction with multimodal models by uploading images and videos, and with omni models using voice and video chat [00:00:44]. Features include webdev and deep research [00:00:55].
- Blog: Technical details about new releases are available on the blog at qwen.github.io [00:01:03].
- GitHub and Hugging Face: Qwen’s codes are available on GitHub, and model checkpoints can be downloaded from Hugging Face, allowing developers to experiment with the models [00:01:17].
Key Features and Capabilities Enabling Applications
Hybrid Thinking Mode
Qwen 3 introduces a hybrid thinking mode, combining thinking and non-thinking behaviors within a single model [00:05:27].
- Thinking Mode: Before answering, the model reflects, explores possibilities, and then provides a detailed answer, similar to models like 01 and DeepR1 [00:05:42].
- Non-thinking Mode: Functions like a traditional instruction-tuned chatbot, providing near-instant answers without delay [00:06:09].
This dual-mode capability is controllable via prompts or hyperparameters [00:06:30]. It also allows for a dynamic thinking budget, which refers to the maximum thinking tokens [00:06:41]. Performance significantly increases with larger thinking budgets, especially in tasks like math and coding [00:07:45]. For example, a 32,000-token thinking budget can achieve over 80% in AM 24, compared to just over 40% with a very small budget [00:07:59]. This allows users to balance accuracy and token usage based on specific task requirements [00:08:21].
Multilingual Support
Qwen 3 supports over 119 languages and dialects, a significant increase from Qwen 2.5’s 29 languages [00:08:52]. This expanded linguistic support is beneficial for global applications, particularly for users of open-source models that may not have previously supported many languages well [00:09:17].
Agentic Capabilities and Coding
Qwen models have enhanced capabilities in agents and coding, with specific improvements for MCP [00:09:41]. The models can use tools during their thinking process, make function calls, receive environmental feedback, and continue thinking [00:09:56]. This capability is crucial for multiagent systems and enables the model to be productive in real-world working life [00:11:08]. An example includes organizing a desktop by accessing the file system, determining which tools to use, and iteratively thinking and executing [00:10:29].
Multimodal Models
Beyond large language models, Qwen also develops multimodal models, focusing on vision-language models [00:13:37].
- Qwen 2.5 VL: Released in January, it achieves competitive performance in vision-language benchmarks like MMU, MathVista, and general VQA [00:12:49].
- Qwen-VQ: Explores thinking capabilities for vision-language models, showing improved performance in reasoning tasks (like mathematics) with larger thinking budgets [00:13:16].
The ultimate goal is to build an “omni model” that accepts multiple modalities (text, vision including images and videos, audio) as inputs and generates multiple modalities (text, audio) as outputs [00:13:51]. While not yet perfect, a 7-billion parameter omni model is capable of voice, video, and text chat, and shows strong performance in audio tasks and even in vision-language understanding compared to Qwen 2.5 VL [00:14:13]. Future improvements aim to recover performance in language and agent tasks [00:15:20].
Qwen’s Open Sourcing Philosophy and Benefits
Qwen is committed to open sourcing their models [00:15:52].
- Feedback and Improvement: Open sourcing helps gather feedback from developers, which aids in improving the models [00:15:57].
- Community Engagement: Interaction with the open-source community encourages the development of better models [00:16:09].
- Diverse Model Sizes: Qwen provides many model sizes, from very small (0.6 billion parameters) to large (235 billion parameters), catering to various user needs [00:16:43].
- Quantized Models: Quantized models are provided in different formats (GGUF, GGML, AWQ, MOX for Apple) to support diverse deployment scenarios [00:17:05].
- Permissive Licensing: Most models use the Apache 2.0 license, allowing free use and modification for business purposes without needing permission [00:17:15]. This supports commercial and enterprise application of open AI models.
- Third-Party Framework Support: Qwen models are widely supported by various third-party frameworks and API platforms [00:17:47].
- Qwen Coders: Qwen 2.5 Coders are popular for local development, and Qwen 3 Coders are currently being built [00:16:25]. This directly relates to application of open AI models in coding with agencies like copilot and integration of AI into development environments and editors.
Real-World Applications and Products
Qwen is building products to enable interaction with their models and to create agents [00:17:59].
- Webdev: This feature allows users to generate and deploy websites from simple prompts [00:18:12]. Examples include creating a Twitter website or a sunscreen product introduction website [00:18:21]. It can also generate visually appealing cards based on provided links [00:19:03].
- Deep Research: Users can ask the model to write comprehensive reports on topics of interest, such as the healthcare industry or artificial intelligence [00:19:43]. The model first makes a plan, then searches step-by-step, writes parts, and finally delivers a downloadable PDF report [00:20:12]. Reinforcement learning is being used to fine-tune models specifically for deep research to enhance productivity in working life [00:20:36]. This capability is related to user problems and discovery in AI startups by addressing the need for efficient information gathering.
Future Directions for Real-World Impact
Qwen’s future efforts are geared towards achieving Artificial General Intelligence (AGI) and building better foundation models and agents [00:21:01].
- Training Improvements: Continued focus on training methods, including incorporating more and better quality multimodal and synthetic data [00:21:17]. Exploring new pre-training methods beyond next-token prediction, possibly using reinforcement learning in pre-training [00:21:52].
- Scaling Laws: Shifting focus from scaling model sizes and pre-training data to scaling compute in reinforcement learning [00:22:16]. Emphasis on long-horizon reasoning with environment feedback, enabling models to become smarter through continuous interaction and thinking (inference time scaling) [00:22:28].
- Context Scaling: Aiming to scale context window to at least 1 million tokens this year, with aspirations for 10 million tokens and eventually infinite context [00:23:02].
- Modality Scaling: Increasing capabilities by scaling modalities for both inputs and outputs, even if it doesn’t directly increase “intelligence” [00:23:25]. This includes unifying understanding and generation, like simultaneous image understanding and generation similar to GPT-4 [00:24:05]. Vision capability, for instance, is essential for creating GUI agents for computer use [00:23:44].
These advancements signify a shift from training models to training agents, especially by integrating reinforcement learning with environment interaction [00:24:36]. This highlights the ongoing evolution towards multiagent systems and broader implementing AI in enterprises.