From: redpointai

Pinecone, a vector database, has raised over 750 million valuation [00:00:03]. It has become a core tool for building AI apps [00:00:11].

Evolution of the Vector Database Landscape

Before the generative AI craze and the launch of ChatGPT, vector databases were largely underutilized [00:00:54]. They have diverse use cases including semantic search, candidate generation, ranking, feed ranking, and recommendation [00:00:58]. Larger companies like Google and Amazon internally used vector databases for search, recommendation, feed ranking, and anomaly detection [00:01:18]. However, educating the broader market about this technology was challenging [00:01:27]. Investors were often confused when presented with the concept of a “vector database” [00:01:36].

The launch of ChatGPT significantly elevated the discussion about vector databases to a broader audience [00:02:27]. While the underlying technology for practitioners didn’t change dramatically, the increased capital and energy led to its adoption by non-AI engineers [00:02:40].

The AutoGPT Effect

A significant moment that spiked usage was the emergence of AutoGPT, an open-source “agent” gadget [00:05:15]. This tool, though minimalistic, took off “like crazy” [00:05:48]. Suddenly, Pinecone’s onboarding users were not just systems engineers building AI platforms or RAG (Retrieval Augmented Generation) applications, but also individuals like dentists who remembered Python from college [00:05:55]. This surge led to immense scaling challenges, with Pinecone experiencing up to 10,000 signups daily [00:03:18]. This forced a redesign of their solution, resulting in a serverless architecture that is two orders of magnitude more efficient [00:03:32].

Applications and Scale

Today, a “million vectors” is considered a small scale [00:06:57]. Pinecone truly shines at handling hundreds of millions to billions of vectors [00:07:18]. This scale is crucial for SaaS companies and software providers like Notion and Gong, who develop deep AI solutions (e.g., AI Q&A, semantic search) for their customers’ data [00:07:21]. For instance, a provider with 10,000 customers, each with a million documents, would need a vector database to handle 10 billion embeddings across thousands of partitions [00:07:48].

The focus is now on unit economics for true product building, not just lab experimentation [00:08:10]. Pinecone’s serverless architecture allows for multi-tenant workloads, reducing the cost per paying customer to potentially a dollar or even 50 cents per year [00:08:31].

Common applications running on Pinecone include:

RAG (Retrieval Augmented Generation) applications are largely based on semantic search, where the “magic sauce” happens at the semantic search layer [00:10:26].

Challenges in Enterprise Adoption

Despite rapid advancements, the average company still struggles to train even simple deep learning models [00:11:47]. A major barrier for effective AI app deployment is the issue of “hallucinations” in Large Language Models (LLMs) [00:13:04]. LLMs are designed to generate language, and when compelled to write on topics they “know nothing about,” they will produce text that may contain inaccuracies [00:13:18].

Addressing Hallucinations

Solutions are emerging to measure and mitigate hallucinations [00:15:18]. Key efforts include:

  • Measurement: Developing metrics to assess usefulness, correctness, and faithfulness to data [00:16:05].
  • Knowledge Layers: Using RAG and vector databases to make data securely available to models, ensuring data governance and compliance (e.g., GDPR deletion requirements) [00:16:31]. This community has made significant progress in providing robust solutions [00:17:08].
  • Outperforming GPT-4: Experiments have shown that loading a good chunk of the internet into Pinecone and running a language model with unsophisticated RAG can already outperform GPT-4 [00:17:35].

Competitive Landscape

“Vectors are the language of AI[00:18:15]. This realization has led to a “land grab” where many startups and incumbents are vying to store vectors [00:18:21]. Incumbent database providers are adding support for vector data types, seeing vectors as another data type like sentences or images [00:19:01].

However, simply storing a float array (vector) doesn’t make a database a true vector database [00:19:17]. What makes vector databases unique is that the numeric array acts as the primary lookup key, organizing data on storage and enabling efficient searching [00:19:53]. Attempting to “bolt on” this data type to a traditional database often results in poor performance and serious issues [00:20:23].

The RAG Stack

For building a RAG product, common components include:

  • Models: Smaller, cheaper, and sometimes open-source models often perform well for embedding and summarization [00:21:11].
  • Data Transformation: Tools like AnyScale’s product are used for bulk transformations and data movement [00:21:30].
  • Application Launch: Collaborations with companies like Forel and others for building the application layer [00:21:48].
  • Partnerships: Close work with Cohere, AI21, and Hugging Face for models [00:21:59].
  • Data Prep: Companies like Unstructured focus on generating embeddings and preparing data for vector databases, especially for standardized data channels (e.g., importing data from Snowflake via Fivetran) [00:23:36].

Cost Optimization and Context Windows

The market is driven by finding a stable point between cost, compute, infrastructure, and output quality [00:25:17]. Running massive models (e.g., 100 billion parameters) for every API call is not sustainable due to cost and environmental impact [00:25:53].

While model companies are expanding context windows, it often doesn’t help more often than not [00:27:09]. Sending all company data to an LLM on every query is clearly not the right approach [00:28:18]. Even for smaller cases where context windows are theoretically workable, using a vector database for retrieval (e.g., sending 3,000-10,000 tokens instead of 100,000) can achieve similar or better results at a tenth of the cost [00:29:02].

Vendor Approach and Customer Education

Pinecone adopted a strategy of either fully handling a feature (e.g., automated infrastructure) or explicitly not doing it, clearly documenting their scope [00:33:09]. This approach fostered customer independence, leading to thousands of successful customers who don’t require direct support [00:33:43]. However, Pinecone now offers consultation for customers who need help, though not full Professional Services [00:34:40].

A common failure mode for potential users is incorrect cost estimations [00:35:02]. Companies might incorrectly estimate costs to be exorbitant (e.g., 500/month) [00:35:07]. This miscalculation can prevent them from even starting a project [00:36:13].

The transition to a serverless model, while painful for the company’s short-term revenue, is seen as the right thing for the customer and the platform’s future [00:37:29]. This shift can lead to significant cost reductions for customers (e.g., 50-90% reduction) [00:38:45]. By fitting snugly into customers’ cost structures, Pinecone aims to be a fundamental part of the future AI stack as more companies adopt vector databases [00:39:38].

Future Outlook for AI Infrastructure

The current AI infrastructure landscape is still in its very early stages [00:40:47].

Hardware Shifts

There will be a significant shift in the types of hardware used for AI [00:47:56]. The current reliance on GPUs is not sustainable long-term [00:48:03]. The future will likely involve a mix of CPUs, GPUs, and specialized servers optimized for training or serving, potentially with distributed infrastructure [00:48:36].

Data Management

Existing data pipeline and data management tools from 5-10 years ago are no longer sufficient [00:49:03]. Operational headaches, costs, and wait times are unreasonable, necessitating changes [00:49:17].

Moderation and Governance

Future AI stacks will require moderating systems to provide governance, visibility, and control, even when models are not self-owned [00:49:34].

Application Layer Opportunities

While infrastructure has a “winner take all” phenomenon, the application and solution space offers immense energy, excitement, and opportunities for new companies to solve problems in creative ways [00:41:51]. This includes startups challenging long-term players and enterprises learning to adopt these new technologies [00:42:27].

Overhyped vs. Underhyped

  • Overhyped: Foundational models [00:50:54]. The speaker believes their capabilities are well-understood, and there hasn’t been significant qualitative progress recently [00:50:58].
  • Underhyped: Code generation and coding assistance [00:51:14]. This is considered one of the most exciting and useful applications of the technology [00:51:21].

Agents

The speaker believes AI agents “already work” [00:54:02], suggesting that the bar for reliability should be comparable to human assistants [00:54:15]. While their mistakes can be “silly” (non-human-like), the probability of completing tasks is nearing human levels [00:54:41].