AI infrastructure and tooling

From: redpointai

Unsupervised Learning, a podcast by Redpoint Ventures, conducted a crossover episode with Latace (Laten Space), a technical newsletter and podcast for AI engineers [00:00:00]. Laten Space had over 2 million downloads in 2024 and is a go-to resource for understanding the cutting edge of AI infrastructure, tooling, and product [00:00:06]. This special episode features a discussion on what surprised them, what they are paying attention to, defensibility at the app layer, and public company outlooks [00:00:27].

Surprises in AI

Model Evolution and Open Source

One significant surprise in the past year was the rapid advancement of models [00:01:12]. The release of new models right after Ilia’s “scaling is dead” talk created a sense of “so back” in a short period [00:01:27]. This timing, where pre-training seemed to be “dying” and inference time became the “new scaling law,” was noted as “suspiciously neat” [00:02:07].

A negative surprise was the relevance and adoption of open-source models in enterprises [00:02:28]. While the local Llama community and hobbyists appreciate them, enterprise adoption of open-source models was estimated at only about 5% and decreasing [00:03:01]. Enterprises are primarily in “use case discovery mode,” opting for what they perceive as the most powerful models [00:03:14].

Conversely, another surprise was how quickly open source caught up on reasoning models, particularly DeepSeek [00:03:36]. This speed indicated that the time period for a closed-source model to have a compounding advantage is much shorter than anticipated [00:04:04]. However, it was clarified that this was specific to DeepSeek’s rapid catch-up, and there isn’t a “team open source,” but rather individual companies choosing to open source [00:04:14]. The ability to replicate what’s already been done is cheaper than creating something fundamentally new [00:04:48]. DeepSeek was considered overhyped because it primarily executed well on existing concepts rather than introducing fundamental innovations [00:04:54]. DeepSeek’s main unique contribution was providing full traces for open source models [00:05:15].

Application Layer Developments

Another surprise was that low-code builders did not capture the AI builder market [00:07:51]. Companies like Zapier, AirTable, Retool, and Notion, despite having the DNA and distribution, largely missed this shift [00:08:11]. They focused on improving their existing baselines with AI features, rather than building from scratch with new AI paradigms [00:09:09]. This led to the rise of “GPT rappers” — companies that build directly on top of powerful models [00:07:03]. The consensus shifted from making fun of “GPT rappers” to recognizing them as the most interesting development [00:07:07].

Overhyped vs. Underhyped

Overhyped

Agents Frameworks: These are seen as overhyped, likened to the “jQuery era” rather than the “React” era of development [00:11:25]. The underlying workloads are too fluid for stable frameworks [00:10:55]. Instead of frameworks, the focus should be on protocols, like MCP, which enabled Ajax [00:11:55].
New Model Training Companies: There is surprise at the continued emergence of new companies training models, despite the competitive landscape [00:16:14]. Many pursue AGI, but it’s unclear what unique gap they fill [00:16:39]. Historically, general-purpose models tend to outperform hyper-specific ones in quality [00:19:18].

Underhyped

Memory (Stateful AI): Memory, beyond conversational memory, such as storing knowledge graphs that exceed context length, is significantly underhyped [00:15:00]. A better memory abstraction could lead to smarter agents that learn on the job [00:15:28]. This is an interesting space for VCs due to its resemblance to databases [00:16:07].
Private Cloud Compute (PCC): Apple’s Private Cloud Compute (PCC) is under the radar but has potential [00:12:45]. It brings on-device security to the cloud, which is crucial as large enterprises need multi-tenant architectures to access enough GPUs while maintaining single-tenant guarantees [00:13:50].
Application Areas:
- Customer Support: Companies like Sierra are demonstrating significant product-market fit by tackling customer support, a major cost center for many businesses [00:30:40].
- Voice AI: In areas like scheduling and intake for home services, where businesses currently miss 50% of calls, an AI that is 75% effective can still drive significant revenue increase [00:33:30]. This highlights that 100% accuracy isn’t always needed for high value [00:33:19].

Product Market Fit in AI

Current categories with strong product market fit include:

Coding Agents (e.g., Cursor) [00:26:28]
Customer Support Agents (e.g., Sierra, Dekugan) [00:30:40]
Deep Research/Search (e.g., Perplexity, OpenAI’s deep research) [00:26:42]
Summarization of Voice and Conversation (e.g., Granola, Monterey, Bridge) [00:35:31]

Emerging areas showing promise include:

Screen sharing assistance [00:35:56]
Outbound sales [00:36:06]
Hiring/recruiting [00:36:14]
Personalized teaching/education (e.g., Duolingo, Kamigo) [00:36:16]
Finance-specific applications [00:36:31]
Personal AI [00:36:35]

The shift from AI being primarily about “cost cutting” to focusing more on “growth revenue” is noted [00:32:02]. Initial AI apps focused on tasks easily outsourced (cost centers), but the next wave of apps may be more defensible by directly increasing top-line revenue [00:32:54].

Defensibility at the App Layer

Defensibility at the application layer primarily comes from:

Network Effects: Prioritizing multiplayer experiences over single-player ones [00:39:16]. Chai Research, a Character.AI competitor, is cited as an example, thriving due to its network of people submitting models, creating robustness against future changes [00:40:19].
Brand: Companies can quickly become synonymous with an entire category, gaining preferential access to customers [00:41:20].
Velocity and User Experience: The ability to quickly build a broad product and adapt to new models (every 3-6 months) is critical [00:42:12]. This involves “a thousand small things” in user experience and design that compound over time [00:42:06]. This approach is more analogous to traditional application SaaS defensibility, contrasting with the early “head fake” of relying on unique datasets or training custom models for defensibility [00:41:51].

AI Infrastructure Categories

The “LLM OS” concept encompasses several interesting AI infrastructure categories:

Code Execution: A key area of focus for investment [00:43:12].
Memory: As discussed, stateful AI is underhyped [00:43:19].
Search: Critical for augmented generation [00:43:24].
Security: AI is needed on the defense side wherever it’s used offensively (e.g., email security, identity) [00:43:44]. This includes rethinking red teaming and leveraging models for semantic understanding in areas like binary inspection [00:44:00].

The value in AI infrastructure is more in the “infra around the model” rather than bare metal or capital-intensive model serving (like GPU clouds), which are considered less appealing for venture investment [00:44:51].

OpenAI and other large labs are increasingly encompassing parts of the developer and AI infrastructure categories. For instance, search, once a distinct API (e.g., Perplexity), is now offered as an OpenAI API [00:45:21]. This raises the question of whether model labs aim to be API companies or product companies [00:46:18]. The application layer is currently seen as significantly more interesting than the AI infrastructure layer, as it allows charging for utility rather than just cost-plus [00:47:10].

Specific areas struggling to gain significant traction as standalone businesses include:

Finetuning Companies: Unless wrapped into a broader enterprise AI services offering, finetuning alone is not seen as a large standalone opportunity [00:47:56].
AI DevOps/ASR (Autonomous System Response): While there’s a lot of data, and the potential for self-healing apps exists, the technology isn’t fully there yet for it to be a significant autonomous SRE solution [00:48:20]. It’s largely still an anomaly detection problem, not a transformer LLM use case [00:49:25].
Voice Real-time Infrastructure: While interesting, its ultimate market size is questioned [00:48:37].

Unanswered Questions and Future Implications

A significant unanswered question is the applicability of Reinforcement Learning (RL) to non-verifiable domains [00:50:17]. While RL works in verifiable domains (like coding or mathematics), its success in areas like law (contracts), marketing, and sales is unclear [00:50:27]. If RL doesn’t succeed in these areas, it could lead to a future where autonomous agents excel in verifiable tasks, but non-verifiable tasks still require human “taste makers” [00:50:47].

Another critical question revolves around hardware scaling and GPU dominance [00:51:48]. OpenAI’s “rule of nines” suggests an order of magnitude increase in compute for each 9 (e.g., 90% to 99% reliability), occurring every 2-3 years [00:51:35]. The availability of GPUs and the dominance of Nvidia (and its CUDA ecosystem) are major concerns [00:52:00]. While AWS (Trainium), AMD, Microsoft, and Facebook are developing their own chips, the general-purpose nature of GPUs and the stability of the transformer architecture make them hard to displace [00:52:05].

Finally, Agent Authentication is an emerging question [00:54:41]. When an agent accesses a website on a user’s behalf, how does it indicate it’s an agent, not the user? A new “SSO for agents” (Single Sign-On) is needed [00:55:06]. This raises questions about methods like scanning eyeballs, as proposed by Sam Altman [00:55:21].

Tubegraph

Explorer

Table of Contents