AI infrastructure and vector databases

From: redpointai

Pinecone, a vector database that has raised over $130 mi ll i o na t a$ 750 million valuation, has become a core tool for building AI applications [00:00:00]. CEO and founder Eo Liberty discusses the vector database landscape, barriers for enterprises building AI apps, and future AI infrastructure and opportunities [00:00:12].

Evolution and Impact of Generative AI

Before the generative AI “craze” and the launch of ChatGPT, vector databases were vastly underutilized for many use cases, including semantic search, candidate generation, ranking, feed ranking, and recommendations [00:00:51]. Larger companies like Google and Amazon used them internally, but educating the broader market was challenging [00:01:16]. Investors were often confused when vector databases were pitched [00:01:33]. Earlier, Eo Liberty had worked on AWS SageMaker, an MLOps product, and many initially mistook Pinecone for an MLOps solution [00:01:54].

The launch of OpenAI’s ChatGPT elevated the discussion to a broader audience [00:02:27]. While the core technology of vector databases didn’t significantly change for practitioners, the surge in capital and energy made it accessible to non-AI engineers [00:02:40]. Pinecone experienced an “insane” increase in demand, sometimes seeing 10,000 sign-ups daily, pushing them to exhaust cloud machine resources and spend millions monthly on their free tier [00:02:53]. This necessitated a complete redesign, leading to their “serverless” solution, which is two orders of magnitude more efficient [00:03:31].

A particular moment that spiked usage was the open-source gadget “AutoGPT,” a minimalistic precursor to what is now called an agent [00:05:15]. This tool’s popularity meant Pinecone was suddenly being onboarded by people like “the dentist that remembers Python from college” rather than systems engineers [00:06:08]. While this normalized, most current users are builders [00:06:25].

Scaling and Applications of Vector Databases

Pinecone truly excels at handling hundreds of millions to billions of vectors [00:07:18]. For companies that are themselves SaaS providers, such as Notion or Gong, Pinecone allows them to develop deep AI solutions (Q&A, semantic search) for their customers’ data [00:07:24]. For example, a company with 10,000 customers, each with a million documents, would need a vector database capable of handling 10 billion embeddings across thousands of partitions cost-effectively [00:07:48]. Pinecone’s serverless architecture significantly reduces costs, potentially to a dollar or even 50 cents per paying user per year for their customers [00:08:31].

Common applications built on Pinecone include:

Q&A and semantic search [00:09:37]
Chatbots and support bots [00:09:42]
Legal discovery [00:09:50]
Medical history discovery and analytics [00:09:54]
Applications involving images, audio, video [00:10:05]
Anomaly detection and security [00:10:08]
Pharma and drug discovery [00:10:12]

While there’s interest in multimodal applications, text data still dominates for mainstream technology developers [00:11:06].

Barriers to AI Adoption and Solutions

One of the biggest barriers for companies deploying AI applications is hallucination [00:13:04]. Large language models (LLMs) are designed to generate language, and they will produce text even when they “know nothing” about a topic, similar to a student in sixth grade [00:13:13].

Solutions to address hallucinations include:

Measurement: The ability to truly measure hallucinations is only just beginning [00:15:20]. A model that never hallucinates by always saying “I don’t know” is useless [00:15:35]. The focus must be on measuring usefulness, correctness, and faithfulness to data [00:16:05].
Retrieval Augmented Generation (RAG): This involves making data available to the model in a secure, governed, and controllable way using vector databases [00:16:28]. RAG, based on semantic search, is where the “magic sauce” happens in AI applications [00:10:26]. An LLM with a vector database and RAG can already outperform GPT-4 today in certain experiments [00:17:40].

Vector Database Market Landscape

Vectors are considered the “language of AI” [00:18:15]. The understanding that vectors are a fundamental data type, like sentences or images, has led to an explosion of startups and incumbents adding vector support [00:19:01]. However, simply adding a “float array” type doesn’t make a database a true vector database [00:19:17]. True vector databases are unique because the numeric array is the key, organizing data and enabling search in a highly optimized way [00:19:53]. Bolting on vector support to a non-AI technology will lead to “incredibly poor performance” [00:20:20].

Recommended RAG Stack and AI Infrastructure

For a RAG product, smaller, cheaper, and easier-to-manage open-source models can achieve good performance, eliminating the need to spend a lot of money on larger models [00:21:11].

Key components of a RAG stack:

Vector Database: Pinecone [00:20:58]
Data Transformation/Movement: AnyScale’s product [00:21:37]. Unstructured is also noted as a viable option for embedding generation and data generation into vector databases [00:23:36]. Companies like FiveTran that import data from Snowflake can leverage specialized companies to transform and fit data into a vector database [00:24:02].
LLMs: Pinecone partners with Cohere, AI21, and Hugging Face [00:21:59].
Evaluation Technology: A leader has yet to emerge in this space [00:22:24].

The market will find a stable point with a good trade-off between cost, compute, infrastructure, and output quality [00:25:17]. Running very large models for every API call is unsustainable due to cost and environmental impact [00:25:53]. While context windows are getting longer, this is a pricing model for model companies [00:27:09]. Sending all company data to an LLM on every query is slow, expensive, and often doesn’t improve results [00:27:28]. Using a vector database for retrieval allows sending significantly fewer tokens (e.g., 3,000-10,000 instead of 100,000), reducing costs by 90% without performance loss [00:29:02].

Most companies are still in the early stages of getting AI applications to work [00:30:01]. Iteration over embedding models, retrieval, reranking, and filtering is less common [00:30:14]. Building a complete AI solution around a vector database takes time and involves many considerations like speed, cost, accuracy, data freshness, and input/output format [00:31:10].

Business and Future Outlook

Pinecone’s transition to a serverless model, though painful for revenue in the short term, was seen as necessary for long-term customer benefit and market fit [00:37:29]. Some customers saw a 70-90% reduction in cost [00:38:54]. This strategic move ensures Pinecone fits into the cost structure of tens of thousands of future AI workloads [00:40:23].

Regarding startup opportunities in AI infrastructure, Eo Liberty is more excited about the application and solution space [00:41:15]. Infrastructure often follows a “winner take all” phenomenon, limiting space for new players [00:41:31]. The solution layer, however, is “teaming with innovation,” with hundreds of new applications constantly emerging [00:42:01].

Future shifts in AI infrastructure are predicted:

Hardware: A significant shift is expected away from the current unsustainable reliance on GPUs. This could involve more CPU workloads, models adapted for CPUs, or specialized servers optimized for training and serving [00:47:56].
Data Pipelines: Existing tools from 5-10 years ago are insufficient, leading to operational headaches, high costs, and long wait times [00:49:03].
Governance and Visibility: Moderating systems are needed to provide governance and control over AI stacks that currently run “open loop” for most companies [00:49:34].

Overhyped, Underhyped, and Surprises

Overhyped: Foundational models are considered overhyped because, while powerful, qualitative progress has stagnated for some time [00:50:54].
Underhyped: Code generation and coding assistants are “exceedingly useful” and considered one of the most exciting use cases [00:51:11].
Biggest Surprise: A complete rewrite of Pinecone’s database in Rust, which was thought to be “borderline suicide” and take six months, was completed in 2-3 months and yielded significantly better results, avoiding the common pitfalls of rewrites [00:51:51].

Reflections on AI in Larger Companies and Agents

Large companies like Amazon (where Eo Liberty previously worked on SageMaker), Google, and Microsoft have different innovation horizons and risk appetites [00:46:02]. Their products need to generate hundreds of millions of dollars annually to “move the needle,” whereas startups can focus on emerging parts of the stack years before they hit the mainstream market [00:46:02].

On AI agents, Eo Liberty believes they “work already,” with the probability of completing a task getting close to human levels [00:54:02]. While their mistakes might be “silly” or “embarrassing” compared to human errors, their overall performance is decent [00:54:15].

A common failure mode for those trying to implement AI solutions is incorrect cost estimation [00:35:59]. Often, initial napkin math leads to inflated cost projections that deter companies from even starting a project [00:35:02]. The most significant failure, however, is “doing nothing” [00:55:49].

Tubegraph

Explorer

Table of Contents