From: redpointai

Pine Cone, a vector database, redesigned its solution to a serverless architecture due to rapid scaling and overwhelming demand [00:32:26]. This shift was a necessity as the company “literally couldn’t handle the load” with their previous setup [00:32:46].

Cost Efficiency for Customers

The serverless architecture is “maybe two orders of Mag more efficient” [00:32:42] than their previous solution. This improvement has significantly impacted customer costs:

  • For users with millions of vectors, the monthly cost can be less than $100 [00:07:07].
  • Pine Cone excels at managing “hundreds of millions [or] billions scale” [00:07:18] embeddings, especially for SaaS providers. These companies often need to handle up to 10 billion embeddings across thousands of partitions cost effectively [00:08:02].
  • The multi-tenant serverless workload allows for costs as low as “a dollar a year” or even “50 cents a year” per paying user [00:08:47]. This focus on unit economics is crucial for companies building products [00:08:15].
  • Customers often miscalculate costs, estimating tens of thousands of dollars per month when the actual cost might be hundreds [00:35:02]. Incorrect cost estimations can prevent companies from even starting a project [00:36:10]. This highlights the importance of cost transparency and accurate projections.

Business Impact and Challenges

The transition to a serverless model was “very painful” for Pine Cone as a company [00:37:29]. While workloads grew faster, revenue was impacted because the product became significantly cheaper [00:37:50].

  • This could mean leaving “more than half of our revenue on the table” [00:38:45].
  • For some extreme cases (e.g., Pharma, storage-heavy workloads), cost reductions could be 70%, 80%, or even 90% [00:38:54]. Some customers who paid 2,000 a year [00:39:09].
  • Despite the financial impact, the company believes it was the “right thing for the customer” and for those building solutions on their platform [00:39:28]. The long-term vision is for Pine Cone to fit “snugly into that cost structure for like tens of thousands of or hundreds of thousands of different workloads” [00:40:28].
  • Advice to founders undergoing similar transitions is to move and “take the hit as soon as possible” [00:38:33]. This aligns with Lean Startup principles of iterating quickly.

Future of AI Infrastructure and Cost

The current reliance on GPUs for making models bigger is deemed unsustainable from an economic perspective [00:25:34]. Operating a 100 billion parameter model for every API call is not feasible financially or environmentally (“you’re going to go bankrupt right and you’re going to make the planet hotter”) [00:25:58].

  • There needs to be a significant shift in hardware, possibly involving more CPU workloads, specialized servers, or distributed infrastructure optimized for training or serving [00:48:20].
  • The speaker is bullish that models can achieve similar or better results at a fraction of the cost by “doing things right” with smaller models and efficient architectures [00:26:13].
  • The use of larger context windows by model companies is seen as a pricing strategy to make users pay more per token [00:27:16]. This approach is criticized for being slow, expensive, and not always helpful [00:27:28].
  • Instead of stuffing all data into a context window, leveraging a vector database can reduce token usage significantly (e.g., sending 3,000 tokens instead of 100,000), resulting in 10% of the cost without performance loss [00:29:02]. This demonstrates cost optimization in AI.