The evolution and adoption of serverless architecture in AI startups

From: redpointai

Pinecone, a vector database, has raised over $130 mi ll i o n, rece n tl y a c hi e v in g a$ 750 million valuation. It has become a fundamental tool for building AI applications [00:00:11].

Early Challenges and Market Education

Before the generative AI “craze” and the launch of ChatGPT, vector databases were vastly underutilized for many use cases like semantic search, candidate generation, ranking, and recommendation [00:01:10]. While larger companies like Google and Amazon used this technology internally, educating the broader market was challenging [00:01:29]. Investors were often confused when the founder, who previously helped build Amazon SageMaker (an MLOps product), spoke about building a vector database, frequently mistaking it for an MLOps solution [00:02:20].

The “ChatGPT Effect” and Rapid Scaling

The launch of ChatGPT and OpenAI’s broader efforts significantly elevated the discussion around AI to a wider audience [00:02:30]. While the underlying technology for practitioners didn’t change drastically, there was a massive influx of capital and energy, making it accessible to non-AI engineers [00:02:49].

This led to an “insane” surge in demand for Pinecone:

The company started rapidly ramping up, exhausting machines in GCP and AWS clouds [00:03:04].
They were spending millions of dollars a month on their free tier [00:03:07].
At peak, they experienced 10,000 signups daily [00:03:23].

A particular moment that dramatically spiked usage was the emergence of an open-source tool called AutoGPT [00:05:18]. This minimalist, agent-like gadget, often just a single Python file, “took off like crazy” [00:05:51]. The shift in user base was notable: instead of systems engineers building AI platforms, Pinecone was being onboarded by individuals like “the dentist that remembers Python from college and wants to play with AI” [00:06:13].

The Transition to Serverless Architecture

The overwhelming demand forced Pinecone to rethink scale and efficiency in a completely different way [00:03:30]. They had to go back to the drawing board and redesign their entire solution, resulting in their current serverless architecture [00:03:42]. This new architecture is two orders of magnitude more efficient and was born out of sheer necessity, as they literally couldn’t handle the load [00:03:51].

The serverless model significantly reduces costs for customers. For a company with millions of vectors, the cost might be as low as $100 per month [00:07:08]. Pinecone truly shines at the hundreds of millions to billions scale, where companies (often SaaS providers themselves like Notion or Gong) develop deep AI solutions for their own customers’ data [00:07:41]. These customers might have millions of documents each, and with 10,000 customers, they need a vector database that can handle 10 billion embeddings across thousands of partitions cost-effectively [00:08:02].

“Because of our new architecture, because we went through this insane spike a year and a half ago and we’ve re architected everything and now we’ve launched it as serverless, we can run this multi-tenant workload such that one of you your cost for one of your paying customers on Pine Cone might be a dollar a year like like sometimes 50 cents a year right for a paying user right and then the unit economics really shine.” [00:08:50]

Business Implications and Strategic Decisions

The transition to a serverless model was “very painful” for Pinecone as a company [00:37:31]. It’s a “hard transition” for founders, as investors might see revenue flatten out even as workloads grow faster, due to the product being significantly cheaper [00:38:03]. The CEO advises founders to make such transitions earlier rather than later, taking the hit as soon as possible [00:38:37].

Pinecone estimates they are “leaving on the table more than half of our revenue” [00:38:45]. For some customers, especially in storage-heavy industries like Pharma, the cost reduction could be 70%, 80%, or even 90% [00:39:07]. Some companies that used to pay $100, 000 n o wp a yo n l y$ 2,000 a month/year [00:39:13]. While investors may not be happy, it is considered the right thing for the customer and for building solutions on the platform [00:39:40].

[!NOTE|Early Stage of AI Adoption] The AI stack is in its very early stages, with mostly trailblazers and early adopters in production [00:40:11]. The expectation is that almost every company will eventually utilize a vector database [00:40:19]. To fit “snugly” into the cost structure for tens or hundreds of thousands of different workloads, cost-effectiveness is paramount, even if it means short-term pain [00:40:38].

Challenges in AI Adoption for Enterprises

One of the biggest barriers to enterprise AI adoption is hallucination in models, leading to a lack of trustworthiness [00:13:09]. Models are designed to generate language, and when compelled to write on unfamiliar topics, they will produce text that “contains stuff” [00:13:30]. This is a significant problem that needs societal and technological solutions [00:14:03].

Solutions involve:

Measuring Hallucinations: The ability to accurately measure hallucinations, usefulness, correctness, and faithfulness to data is still a complex challenge [00:16:21].
Retrieval Augmented Generation (RAG): Using vector databases and knowledge layers helps make data available to models in a secure, governed, and manageable way (e.g., for data deletion requirements like GDPR) [00:17:13]. Even today, running a language model with unsophisticated RAG can outperform large models like GPT-4 [00:17:59].

Most companies are still focused on making basic AI implementations work [00:30:04]. Only a few have enough experience to iterate on embedding models, retrieval, re-ranking, and filtering [00:30:22]. Building a complete AI solution for specific use cases (like Q&A or chat support) requires significant work from scientists and engineers over multiple quarters, involving learning, iteration, and customization for different preferences in speed, cost, accuracy, and data freshness [00:31:53].

A common failure mode for companies attempting to build AI solutions is incorrect cost estimations. People often significantly overestimate the cost of using vector databases like Pinecone, leading them to abandon projects that would otherwise be viable [00:36:26].

Trends in AI Infrastructure and Applications

While “storing vectors” is a common understanding, the uniqueness of vector databases lies in how the numeric array becomes the primary key for organizing and searching data, reflecting how the human brain processes information [00:20:18]. Bolting on vector support to traditional databases (like “Bongo” - a likely reference to Mongo) results in poor performance [00:20:33].

There is a market dynamic pushing towards more efficient and cost-effective AI solutions:

Smaller Models: There is no reason to spend a ton of money on large models; smaller, cheaper, and easier-to-manage open-source models often provide good performance [00:21:20].
Economic Trade-offs: The market will find a stable point with a good trade-off between cost, compute, infrastructure, and output quality [00:25:31]. Making models bigger on GPUs is often not reasonable or sustainable in the long term due to cost and environmental impact [00:26:07]. Smaller models, when used correctly, can yield comparable or better results at cents on the dollar [00:26:30].
Context Window Limitations: While larger context windows are available (a pricing model incentive from LLM providers), sending all company data on every query is clearly not the right approach due to cost, slowness, and often limited benefit [00:28:15]. Even for small cases, using a vector database to retrieve only relevant documents (e.g., sending 3,000-10,000 tokens instead of 100,000) can lead to 90% cost savings without performance loss [00:29:16].

Future Outlook for AI Infrastructure

The future of AI model training and deployment will see significant shifts:

Hardware: The current GPU-centric model is not sustainable. There will be changes in hardware, potentially involving more CPUs, specialized servers optimized for training or serving, and distributed infrastructure [00:48:45].
Data Pipelines: Current data pipeline and management tools are insufficient, leading to operational headaches, high costs, and slow processing [00:49:27].
Moderating Systems: There will be a need for moderating systems that offer governance, visibility, and control over the AI stack, which currently often runs open-loop for most companies [00:49:58].

While there is a “win or take all” phenomenon in AI infrastructure, limiting the space for new companies [00:41:51], the application and solution layer offers immense opportunities [00:42:04]. This layer is a “conveyor belt of innovation,” with numerous new applications emerging weekly [00:43:27].

For AI startups, the CEO finds coding assistance to be an “exceedingly useful” and exciting use case [00:51:24]. Looking ahead, he is particularly excited about low-level challenges such as high-performance optimization and compiling for AI models, as the current integrated execution and training software stack is “insane” [00:53:57]. He believes that agents are already functional to a decent extent, performing tasks with a probability close to human levels, although their mistakes can be “more embarrassing” [00:54:49].

[!INFO|Recommendation for Builders] For those looking to get started, the CEO advises focusing on building something exciting. If that endeavor leads to needing a vector database, then great; if not, you’ve still built something valuable. The most common mode of failure is “doing nothing” [00:55:49].

Tubegraph

Explorer

Table of Contents