From: redpointai

Enterprises face significant challenges when adopting and deploying AI solutions, particularly with generative AI (GenAI). While initial excitement around GenAI’s potential is high, many companies find it isn’t ready for prime time within enterprise environments [04:01:00].

Key Challenges in Enterprise AI Deployment

System vs. Model Focus

A primary challenge is that enterprises often acquire models, not complete systems. A model, such as a large language model, might only constitute 10-20% of the entire system needed to solve a business problem [04:54:00]. Enterprises need a fully integrated system, not just a model that requires extensive surrounding infrastructure to be built, which can be complicated [05:01:00].

Specialization vs. Generalization

While Artificial General Intelligence (AGI) aims for general capabilities, enterprises often require specialized AI tailored to specific use cases [05:17:00]. A generalist AI system might generate responses outside of desired parameters, leading to issues. For example, using a generalist AI for performance reviews in the European Union could lead to heavy sanctions, necessitating significant constraints on the model [05:44:00]. The ideal approach for enterprise AI is through specialization [06:02:00].

From Demos to Production

Many compelling GenAI demos, often built with layered components, fail when they reach real user testing and production deployment [07:51:00]. Demos are frequently based on small, curated datasets (e.g., 20 PDFs), and the system breaks down when scaled to real-world volumes like 10,000 PDFs [08:35:00].

Non-ML Hurdles

Beyond machine learning aspects, deployment in enterprises faces significant hurdles related to risk, compliance, and security [08:24:00].

Integrating AI with Human Workflows

High-value use cases carry higher risks when directly exposed to customers [14:26:00]. Enterprises must carefully determine the optimal ratio of AI to human involvement, often by keeping humans in the loop to solve problems that are currently within the AI’s capabilities, gradually expanding AI’s role over time [14:36:00]. AI systems are not yet at the stage of directly replacing people in complex roles, such as making investment decisions [15:01:00], [15:21:00].

Data and Alignment Challenges

Traditional reinforcement learning from human feedback (RLHF) methods have two main issues for enterprises:

  1. Reward Model Training: It requires training an expensive reward model that is then discarded [16:43:00].
  2. Preference Data: It relies on preference data (e.g., thumbs up/down feedback), which needs further annotation (e.g., by internal staff or external companies) to determine what a “good” response would look like. This process is slow, expensive, and becomes even more so for specialized use cases [17:20:00].

Input Data and Extraction

A significant bottleneck is the difficulty of properly contextualizing language models, which requires correct data extraction. Extracting specific information from diverse documents, such as PDFs, is very challenging, and high-quality off-the-shelf extraction systems are not readily available [25:50:00].

Evaluation and Measuring Success

There is currently no standardized, reliable way to evaluate AI systems for enterprises, making it hard to understand deployment risk and real accuracy [26:40:00]. Many companies lack clarity on what they truly want from an AI system or what success looks like [27:27:00]. Current evaluation methods, such as small spreadsheets with limited examples, are unprincipled and have high variance [27:53:00].

Bridging the Gap for AI Developers

The role of an “AI developer” has evolved from a machine learning expert to someone skilled in calling APIs [29:31:00]. This shift necessitates evaluation frameworks that are accessible to these new types of developers, moving away from traditional machine learning data science concepts like test set creation and statistical testing [30:12:00].

Strategies to Overcome Challenges

To overcome these challenges, companies like Contextual AI adopt an end-to-end integrated systems approach with custom alignment and specialization [06:17:00].

This approach focuses on:

  • Vertical Slicing: Instead of building a “Frankenstein’s RAG” with layered, disparate components, they slice vertically through the AI stack, controlling retrieval, reranking, generation, post-training, and alignment [07:11:00].
  • High-Value Use Cases: Focusing on knowledge-intensive use cases where deep integration and specialization provide significant benefits [06:27:00].
  • Direct Feedback Learning: Utilizing thumbs up/down feedback mechanisms from customer deployments and proprietary algorithms (like KTO and APO) to directly optimize models without needing expensive data annotation [20:26:00].
  • Post-Training and Alignment: Recognizing that post-training is where the “magic happens” in AI, making models specifically good at what they are needed for, rather than being generalists [20:57:00], [21:30:00].
  • Customer Engagement: Spending significant time with customers to define success metrics and then incrementally building and productionizing the solution [27:34:00].
  • Integrated Components: Despite bespoke needs, the underlying system components (data extraction, retrieval mechanisms, language model contextualization, post-processing) have commonalities that can be specialized and fine-tuned [28:20:00].

Overall, the focus is on delivering a complete, specialized, and reliable system that can demonstrate a clear return on investment (ROI) for the enterprise [05:01:00], [21:46:00].