From: redpointai

Building an AI infrastructure company presents significant challenges, especially given the rapid pace of change in the AI landscape [00:00:21]. Key areas of complexity include managing evolving models, hardware, and diverse user needs, while also fostering enterprise adoption [00:50:00].

Evolving Model Landscape

The model layer is characterized by constant and rapid changes, posing a significant hurdle for infrastructure providers [00:00:24].

  • Non-Deterministic Models Large language models (LLMs) are probabilistic by nature, which is undesirable when the goal is to deliver factual and truthful results to end-users [00:02:21]. Controlling this non-determinism is crucial [00:02:34].
  • Compound AI Systems Complex business problems often require assembling multiple models across different modalities [00:02:40]. This includes processing audio and visual information for interactive experiences [00:03:00]. Even within the same modality, there are numerous expert LLM models specialized in tasks like classification, summarization, multi-turn chats, or tool calling, each with slight differences [00:03:23].
  • Knowledge Limitations Single models have limited knowledge, as it’s constrained by finite training data [00:03:43]. Real-world information often resides behind public or proprietary enterprise APIs, which models cannot access directly [00:03:51]. The future of AI inference involves “compound AI systems” where multiple models across different modalities, along with various APIs, databases, storage systems, and knowledge bases, work together to deliver optimal results [00:04:09].
  • No “One-Model-Fits-All” Due to the iterative nature of model training, which prioritizes specific problem subsets, no single model can perfectly solve all problems [00:08:13]. Models excel in certain areas and perform poorly in others [00:08:51]. This leads to the belief that the future involves hundreds of small expert models [00:08:57].
  • Quality Control and Versioning Hundreds of new models are released weekly, making quality control, version control, and ensuring production stability challenging [00:07:31].

Hardware Optimization and Management

The rapid evolution of hardware and the scarcity of low-level hardware optimization expertise pose significant challenges [00:28:54].

  • Fast Hardware Cadence Hardware development has accelerated from a three-year cycle to an annual release cycle from vendors [00:29:09].
  • Workload-Specific Optimization There is no “one size fits all” hardware solution for AI workloads [00:29:27]. The best hardware depends on the specific workload pattern, as different hardware excels at removing certain bottlenecks [00:29:37]. Infrastructure companies must absorb the burden of integrating and determining the best hardware for varied and mixed access patterns [00:29:53].
  • Alleviating Developer Burden The goal is to alleviate the complexity of managing and optimizing hardware, allowing developers to focus on product building [00:30:08].

Enterprise Adoption and Product Development

Challenges in AI Adoption and Deployment involve meeting the diverse needs of enterprises and navigating the nuances of model customization.

  • Usability Gap There’s a significant usability gap, especially for enterprises, in leveraging single models for complex problems [00:07:56].
  • Customization vs. Prompt Engineering While prompt engineering offers immediate results and responsiveness for initial model steering, it quickly becomes unmanageable with thousands of lines of system prompts [00:11:07]. This leads to a need for tools to manage complex system prompts and eventually fine-tuning, which absorbs the prompts into the model for faster, cheaper, and higher-quality inference [00:11:31].
  • Pre-training ROI Pre-training models is expensive and has a less compelling Return on Investment (ROI) compared to post-training (fine-tuning) on strong base models, especially for enterprises that need to justify resource allocation and maintain agility [00:13:28].
  • Evaluation (Evals) Many enterprises initially use “vibe-based” evaluations [00:17:02]. However, they quickly realize the need to consciously build and invest in generating good eval datasets to determine actual product impact and stay on top of the rapidly changing state-of-the-art models [00:17:26]. AB testing is the ultimate determinant but has a longer cycle [00:17:47].
  • Function Calling Complexity Function calling is crucial for agents to interact with external tools and enhance answer quality [00:21:34]. However, it’s complex, often requiring:
    • Maintaining long conversation context [00:22:02].
    • Calling into multiple tools (potentially hundreds) [00:22:11].
    • Executing parallel and sequential tool calls as part of a complex coordination plan [00:22:22].
    • Ensuring precision in when and how to call tools [00:22:43].
  • User Experience and Abstraction The industry is still in early stages of figuring out the right user experience and abstraction for agentic workflows [00:27:10]. The choice of abstraction directly influences the complexity of the underlying infrastructure [00:27:34].
  • Accessibility Shift The advent of generative AI (GenAI) and foundation models fundamentally changed accessibility [00:43:16]. Before GenAI, companies needed large machine learning teams to train models from scratch and curate data [00:42:27]. Now, companies can build directly on foundation models, requiring little to no ML teams, which has significantly accelerated adoption [00:43:16].
  • Market Adoption Differences The traditional sequential adoption curve (startups, then digital natives, then traditional enterprises) has been disrupted [00:50:27]. Now, all segments are adopting AI simultaneously due to a massive appetite for the technology [00:50:47]. This means shorter sales cycles and different procurement processes [00:51:35].

Strategic Approach to Building AI Infrastructure

To navigate these challenges and opportunities in AI model development and infrastructure, companies like Fireworks adopt a declarative approach and focus on specialization and customization [00:06:11].

  • Declarative vs. Imperative Design An imperative design gives full control over workflow, inputs, and outputs for deterministic results [00:04:49]. A declarative design defines what problem the system should solve, allowing the system to figure out how to solve it [00:05:10]. Examples include SQL (declarative) versus ETL processes (imperative) [00:05:35].
  • Simplicity and Abstraction The design principle is to deliver the simplest user experience by hiding nitty-gritty details and complexity in the backend, without sacrificing iteration speed [00:06:11]. This leads to leaning towards a more declarative system with full debuggability and maintainability [00:06:33].
  • Building Blocks and Usability Starting with low-level abstractions like single model as a service allows for understanding industry evolution [00:07:49]. However, there’s a recognized need to abstract away the complexity of assembling hundreds of models and managing quality control for enterprise usability [00:07:56].
  • Internal Product Development as Learning Building their own complex systems, like the F1 logical reasoning inference system, allows infrastructure companies to understand the system abstraction and the complexities involved [00:27:55]. This internal exercise helps them define and expose developer-facing plugins and tools, enabling others to build their own custom systems [00:28:13].
  • Specialization and Customization The underlying belief is in specialization and customization, where a “one size fits all” approach for diverse workloads with proprietary data is not optimal [00:48:18]. Providing control and steerability through customization is seen as the path to better solutions [00:48:39]. This involves offering an “Optimizer” that takes inference workload and customization objectives as input to generate optimized deployment configurations and potentially adjusted models [00:48:56].
  • Partnerships and Ecosystem Integration Instead of building every component, infrastructure companies partner with others like LangChain to integrate into existing imperative agentic tools [00:38:17]. The focus remains on simplifying the layer above single models by composing multiple models for better problem-solving [00:38:47].
  • Vertical vs. Horizontal Scaling Hyperscalers aim for vertically integrated stacks like “iPhone” due to their massive resources for data centers, power, and machines [00:31:01]. AI infrastructure companies specialize in problems requiring engineering craftsmanship and deep research, deploying solutions that are horizontally scalable [00:32:03]. The complexity of building compound logical reasoning systems cannot be solved by simply throwing more people and money at it [00:33:03].
  • Local vs. Cloud Inference While cost savings and privacy are often cited for running models locally on desktops, practical deployment on mobile is limited due to power constraints and application metrics [00:33:25]. The privacy argument for local models is nuanced, as most personal data already resides in the cloud [00:35:03].
  • Model-System Co-design Research into model-system co-design is crucial for finding the optimal balance between quality, latency, and cost [00:46:06]. This approach, often seen in companies like Meta, involves close collaboration between research and infrastructure teams to discuss trade-offs [00:45:57].
  • Disruptive Research Attention is also paid to fundamentally disruptive research, such as the next generation of Transformer architectures that could change model training and inference, and how different agents communicate, especially in latent space [00:46:19].