From: aidotengineer
The landscape of artificial intelligence (AI) models is rapidly expanding, with over 50,000 AI models uploaded to Hugging Face per month, a rate exceeding one model per minute [00:00:08]. Open-source models like DeepSeek-R1 have demonstrated their capability by catching up to and even surpassing closed-source models such as GPT-4, proving that significant budgets are not a prerequisite for competitive AI development [00:00:22].
Fedus AI: Democratizing Access to Open Models
Fedus AI offers a platform providing unlimited API requests to over 3,700 truly open AI models, including DeepSeek-R1, for a flat monthly fee of $25 for individuals [00:01:00]. The company’s objective is to provide accessibility to all truly open AI models, with plans to continuously expand their catalog to cover all Hugging Face models [00:01:25]. This approach allows users to choose models based on preference and perceived quality rather than token-based pricing [00:02:28].
A key insight from Fedus AI’s data is the “staying power” of models once they enter production, particularly for commercial users [00:03:06]. Developers prioritize consistency and prefer to update models on their own terms, not when a provider decides to update [00:03:19]. Smaller models, like the Mistral Nemo models (eight months old), continue to dominate commercial usage due to their cost-effectiveness at scale and established presence in production environments and tutorials [00:03:50].
For many commercial and enterprise scales, once they get something working reliably at scale, especially once they have the metrics in place to observe changes and their reliability and they have prompted it to 99% plus accuracy, they really do not want to change their system and have it break overnight [00:05:15].
Similarly, Lama 2 remains a go-to model for AI Safeguard tutorials online, despite newer versions, because of its stability and active use in production [00:06:06].
AI Usage Trends
Based on data from Fedus AI and collaborations with platforms like OpenRouter, the primary uses of open AI models are:
- AI for Creativity or Companionship: Representing 30-40% of all AI traffic, this includes applications for creative writing (e.g., NovelCrafter) and AI role-play/companionship (e.g., Spicy Chat, Soul Haven) [00:07:35]. This segment also includes therapy and journaling apps [00:09:57].
- AI for Code (Copilot & Agents): This category accounts for 20-30% of all traffic [00:13:07].
- ComfyUI and Friends: About 5% of traffic for personal agentic workflows, used by non-developers for complex generation tasks [00:16:39].
- Write and Check (ChatGPT Clones): Roughly 20% of requests, offering similar UI and features to popular chat models [00:17:54].
- AI for Agents and Work: 10-20% of traffic, focusing on workflow automation [00:18:44].
AI for Coding: Copilots and Agents
The “coding co-pilot and coding agents” segment is a significant and rapidly growing area for open AI models [00:13:07].
Autocompletion Tools
Auto-completion tools for code, similar to GitHub Copilot, or chat-based editing within Integrated Development Environments (IDEs), are widely available through plugins and native integrations [00:13:14]. These features are now considered essential for modern IDEs [00:13:28]. Many smaller AI models (3 billion to 12 billion parameters) are considered “good enough” for this task, suggesting that auto-completion for code is largely a solved problem [00:13:36].
Evolution to Agentic Code
The focus of AI in code has shifted towards more agentic code, specifically “nearly autonomous agents with lots of clarifying questions but involve interventions with humans in the loop” [00:14:07]. This phenomenon is termed “Vibe Coding,” where developers primarily interact with the agent through chat and prompts, rather than directly editing code [00:14:25].
These agents can be “token hungry,” generating a thousand times more input and output tokens than a single person chatting with a companion model [00:14:31]. Despite a smaller user base (tens of thousands of coders), the traffic volume generated by coding agents is growing rapidly [00:14:54]. While closed-source models like Claude Sonnet currently dominate this traffic by a 10-to-1 ratio, the growth of open models in this space, especially since the DeepSeek-R1 wave, has been significant [00:15:08].
Notable open-source projects for AI in coding include:
- Client: Focuses on the chat agentic workflow [00:15:57].
- Continue: Provides auto-completion integration [00:16:02]. Combined, these projects offer an experience comparable to commercial models with IDE platforms like Cursor [00:16:09].
AI for Agents and Workflow Automation
This category, representing 10-20% of traffic, focuses on workflow automation within enterprises [00:18:44]. The priority for enterprises is to maximize ROI by getting agents into production while minimizing negative impact [00:19:10].
Workflow Automation with Human Oversight
A common strategy is to build automation systems with “human escape hatches” from day one [00:19:37]. For example, AI agents can draft responses for inbound emails, check inventories, and apply rules, with a human platform for editing and finalizing the response before sending [00:20:04]. At launch, such systems can successfully draft 80-90% of responses, with humans handling the remainder [00:20:42]. Over time, as confidence builds, specific reliable use cases can be fully automated [00:21:07].
For companies in production who is trying to incrementally improve with each step to 99% plus reliability, it’s nearly impossible to do so if you change the model every week [00:25:41].
Challenges of Fully Automated Agents
Attempting 100% full automation without human oversight often leads to negative consequences, such as angry customers due to bad automated responses, potentially killing the entire AI project [00:21:28]. The mythical “fully truly 100% reliable agentic agents” does not currently exist in production environments [00:22:41].
The recommended approach for building AI for production is to aim for solving 80% of problems with escape hatches [00:23:05]. This pragmatic approach, similar to reliability engineering, involves building a streamlined, reliable system for 80-90% of scenarios, then iterating for the remaining failure scenarios to achieve high reliability incrementally [00:24:11]. This strategy, though less “sexy,” ensures project survival, significant ROI, and continuous improvement [00:24:42].
Quirky: A New Open Model for Reliability
Quirky is introduced as a 72-billion parameter linear transformer and attention transformer hybrid that runs at less than half the GPU compute cost of other transformer models [00:26:22]. Built for approximately 10 million), Quirky represents a move towards more efficient and reliable AI model architectures [00:26:52].
The focus for future AI models should be on persisting memories, customization, and improving reliability to create truly useful AI agents, rather than solely chasing benchmark scores like MML (which are losing meaning as models become highly capable) [00:27:01].