Commercial and enterprise application of open AI models

From: aidotengineer

The landscape of Artificial Intelligence (AI) models is rapidly evolving, with “open model warriors” increasingly challenging traditional large laboratories [00:00:03]. Over 50,000 AI models are uploaded to Hugging Face monthly, accelerating at a rate of more than one model per minute [00:00:08]. This rapid proliferation is highlighted by models like DeepSeek-R1, the first open-source model to surpass GPT-4, demonstrating that significant investment isn’t always necessary to compete with major labs [00:00:22]. DeepSeek-R1 alone saw over 4 million downloads of its 685 GB model in a single month, translating to over 2.74 exabytes of data movement across the internet [00:00:32].

Feeds.AI, led by CEO Eugene, provides unlimited API requests to over 3,700 truly open AI models, including DeepSeek-R1, at a flat rate of $25 per month for individuals, with larger plans for scaled-up business users [00:00:51]. Their goal is to provide accessibility to all open AI models and continuously expand their catalog to cover all Hugging Face models, offering unique data and insights into how these models are used [00:01:16].

Preferences and Stability in Open Model Adoption

Individual users on Feeds.AI primarily use DeepSeek-R1, followed by models like Llama Tree, Mistral Nemotron, and Qwen [00:01:54]. For individuals, there’s no token-based charging; instead, plans limit requests to one large model at a time, allowing users to choose models based on preference and “vibes” rather than MML (Massive Multitask Language Understanding) or price [00:02:10].

However, when considering scaled commercial users, the landscape changes dramatically [00:03:38]. Smaller models, such as the 8-month-old Mistral Nemotron, often dominate usage even if newer, larger models exist [00:03:50]. This phenomenon is driven by several factors:

Cost-effectiveness at scale: Small models are generally cheaper [00:04:02].
Production “stickiness”: Models in production environments or tutorials tend to remain unchanged for quarters or even years [00:04:14].
Licensing and early adoption: Mistral Nemotron was an early, truly open-source model with A2 licensing, avoiding the Llama license restrictions that made enterprises uncomfortable [00:04:30]. This led to its adoption as a default model on cloud platforms like AWS and GCP, driving extensive fine-tuning tutorials and replacing existing GPT-3.5 workloads [00:04:50].
Reliability over updates: Commercial and enterprise users prioritize reliable, consistent systems that work, often preferring not to update models if the current version meets their 99% accuracy goals and has established metrics for observation [00:05:14]. The adage “if it ain’t broke, don’t fix it” applies [00:05:47].
Llama 2 as an example: Llama 2 models, despite newer versions, still account for about 2% of the workload due to their use in AI Safeguard tutorials and active production deployments [00:05:53].

Main Use Cases of Open Models

By inferring usage from model metadata and direct customer feedback, Feeds.AI and partners like OpenRouter categorize AI usage by volume [00:06:25]:

AI for Creativity or Friendship

This category represents 30-40% of all AI traffic for non-coding requests [00:07:38].

Creative Writing: Apps like Novel Crafter assist authors and the fanfiction community in outlining, managing, and drafting novels, often with collaborative human and AI writing [00:07:44]. Users also leverage AI for creative content in games like Dungeons and Dragons [00:08:00].
Role-Playing and Companionship: Apps such as Chai and Soulhaven are popular for AI role-playing and companionship, representing the number one use case for non-code or agent workflows and having the most active users [00:08:12]. Contrary to popular belief, over 60% of users in this segment are women, mirroring the romance novel market [00:08:55]. These apps are often used for long conversations, de-stressing, and discussing daily life, particularly when real-life partners are non-existent or emotionally unavailable [00:09:31].
Therapy and Journaling: Dedicated commercial apps exist, but users also adapt ChatGPT clones or companion apps with therapy characters for journaling and emotional support [00:09:57].
“Vibes” Models: This segment prioritizes “vibes” over MML, meaning models are chosen for their personality and approach rather than simply providing direct answers [00:10:48]. For example, in therapy, the AI guides and empowers rather than dictates [00:11:21]. In story writing and role-play, a “slow burn” approach is preferred, focusing on the journey rather than just the ending [00:11:35]. This community is highly dynamic, with new top models constantly emerging, yet users often return to old favorites [00:11:51]. The user base, including closed-source apps like Character.ai, is in the tens of millions [00:12:47].

AI for Code / Coding Agents

This segment accounts for 20-30% of all AI traffic [00:13:05].

Auto-completion and Editing: Tools similar to the original GitHub Copilot or chat-based editing integrated into IDEs are widely used [00:13:14]. Small, good-enough models (3-12 billion parameters) from various AI labs are already solving code auto-completion [00:13:36].
Agentic Coding: The focus is shifting to nearly autonomous agents that involve human intervention and clarifying questions (“Vibe coding”), where developers primarily chat and prompt changes rather than directly touching the code [00:13:51]. These agents are “token hungry,” generating a thousand times more input/output traffic than a single person chatting with a companion model [00:14:31]. While closed models like Claude Sonnet currently dominate this space (10:1 ratio), there’s rapid growth in using open models with such tools following the DeepSeek-R1 wave [00:15:06]. Open-source projects like Client (for chat agentic flow) and Continue (for auto-completion) provide a similar experience to commercial models [00:15:53].

AI for ComfyUI and Friends

This category represents about 5% of traffic for personal agentic workflows [00:16:39]. While known in diffusion for image generation, graph-style UIs are increasingly used by various professionals like reporters, lawyers, musicians, and influencers to chain complicated workflows for text generation [00:16:44]. These power users, though few, generate significant token traffic [00:17:11]. Unlike the sudden growth in coding agents, this space has seen a slow, gradual user base build-up [00:17:32].

AI for Write and Check / ChatGPT Clones

Accounting for about 20% of requests, these are general-purpose AI chat interfaces [00:17:53]. Many platforms offer their own internal UIs (e.g., Phoenix for Feeds.AI) [00:18:02]. TypingMind is noted for its UI polish, one-time fee model, and ability to run locally with any API provider [00:18:10].

AI for Agents and Work (Workflow Automation)

This segment represents 10-20% of traffic [00:18:44].

Workflow Automation with Human Oversight: The primary goal for implementing AI in enterprises is to maximize ROI while minimizing negative impact, especially for risk-averse sectors like finance [00:19:07]. The recommended approach involves “human escape hatches” [00:19:37]. For example, automating email processes (e.g., for insurance claims or logistics companies) involves an AI agent drafting responses and checking against inventories and rules, but with a human checking and finalizing submissions before sending [00:20:04].
- This setup can achieve 80-90% AI-drafted responses at launch, with humans handling the remainder [00:20:41].
- This boosts productivity, ensures management satisfaction, and facilitates AI adoption [00:20:51].
- Over time, as confidence builds, specific reliable use cases can be fully automated [00:21:04].
Risks of Full Automation: Teams that attempt 100% automation without human escape hatches often face burnout from bad automated responses, potentially killing the entire AI project and postponing adoption for a year or more [00:21:28]. The goal is to move beyond the Proof of Concept (PoC) phase into scaled adoption as quickly as possible, iterating from there [00:22:25].
Myth of 100% Reliable Agents: Truly 100% reliable, fully automated agentic agents do not exist in production [00:22:39]. The mindset for building AI into production should be to solve the 80% with escape hatches, as even this can yield millions in productivity and savings for large corporations [00:23:05]. Humans are not 100% reliable either [00:23:29]. Full automation is only acceptable when the eventual mistake rate is tolerable, such as in cold calling (where a lost lead might not have been gained otherwise) or basic customer support (where a human can apologize later) [00:23:44]. The approach should be incremental: build an extremely streamlined, reliable system for 80-90% of scenarios, then repeat the process for the remaining failure scenarios to achieve high reliability (e.g., 99.998%) [00:24:00].

Recommendations for Enterprise AI Adoption

Upgrade models: Enterprises still using older models like Llama 2 are encouraged to upgrade, as newer versions can offer significant improvements [00:25:06].
Consistent model versions: Closed-source labs need to provide stable, consistent model versions to enable incremental improvement for companies in production [00:25:31]. Frequent model changes hinder the ability to achieve high reliability [00:25:48].

Quarky Model and the Future of AI

Quarky is a 72-billion parameter linear Transformer and attention Transformer hybrid with a new architecture that runs at less than half the GPU compute cost of other Transformer models [00:26:21]. It was built for only $100, 000, co m p a re d t oDee pS ee k^{'} s$ 10 million [00:26:52].

As AI models increasingly surpass the MML scores of average office workers, traditional benchmarks are losing their meaning [00:27:01]. The focus shifts to exploring a future where linear Transformer models persist memories, enable customization, and improve reliability for useful AI agents, addressing issues like hallucination and failure [00:27:33].

Tubegraph

Explorer

Table of Contents