Innovations in posttraining and model reasoning

From: redpointai

The field of AI is rapidly evolving, with significant advancements in areas like post-training methodologies and the development of more capable reasoning models. This article summarizes insights from a discussion with Da Kila, CEO and co-founder of Contextual AI, who previously served as Head of Research at Hugging Face and spent five years at Facebook AI Research, where he wrote the first paper on Retrieval-Augmented Generation (RAG). [00:00:00]

OpenAI’s O1 Model and the Shift to Systems Thinking

Da Kila views OpenAI’s O1 model as “very exciting,” noting that it emphasizes thinking about “systems rather than models.” [00:01:01] The O1 model compresses Chain of Thought ideas into the model using Reinforcement Learning from Human Feedback (RLHF), turning the model into a more complex system. [00:01:05] This approach is particularly encouraging for reasoning tasks. [00:01:25]

However, the widespread adoption of this approach depends on latency constraints, as more “thinking” during test time increases latency. [00:01:51] While O1 shows greater power in areas like math and law, older models can still perform better on other tasks and are often faster. [00:02:07]

Contextual AI’s Approach to Enterprise AI

Contextual AI was founded to address the frustration enterprises faced with generative AI not being ready for prime time. [00:03:53] Their approach differs from other foundation model players like OpenAI and Anthropic through two core principles:

Systems Over Models: Contextual AI believes that a model is only “10-20% of this much bigger system that has to solve the problem.” [00:04:54] Enterprises need to buy the entire system, not just the model, to avoid the complexity of building the surrounding infrastructure themselves. [00:05:01]
Specialization Over AGI: Unlike AGI, which is seen as a consumer product needing general intelligence, enterprises often know exactly what they need and prefer specialized AI. [00:05:17] Generalist AI can even be problematic, as in the example of an AI system for performance reviews in the European Union, which could lead to heavy sanctions. [00:05:44] Therefore, the right approach for enterprise AI is through specialization. [00:06:02]

Contextual AI focuses on “end-to-end specialize[ing] all the parts together” into a very integrated system, targeting high-value, knowledge-intensive use cases. [00:06:20] This vertical slicing of the stack, controlling retrieval, reranking, generation, post-training, alignment, and fine-tuning, provides a compounding effect in problem-solving. [00:07:11]

Challenges in Enterprise Deployment

While many “compelling demos” are built with a layered, “Frankenstein’s RAG” approach, they often fail during real user testing due to issues with deployment, risk, compliance, and security. [00:08:05] Demos often rely on small datasets (e.g., 20 PDFs) and “hill climbing directly on the test set.” [00:08:35] When scaled to 10,000 PDFs, “everything breaks down completely.” [00:08:50]

Enterprises are cautioned about directly exposing AI systems to customers, especially for high-value or high-risk use cases. [00:14:18] The focus should be on “keeping humans in the loop,” solving problems within current reach, and gradually increasing complexity. [00:14:39]

Retrieval-Augmented Generation (RAG)

Da Kila is the co-author of the first paper on RAG. [00:08:53] The original vision for RAG was more ambitious than what was published. [00:04:22] The work stemmed from his career-long focus on “grounding” language in perceptual information, initially in early multimodal AI systems. [00:09:29]

The first prototype of RAG, which grounded language in Wikipedia, leveraged Facebook AI image similarity search (FAISS), an early vector database. [00:10:07] The key technical challenge was figuring out “how to backpropagate into the retrieval mechanism” to train the system effectively. [00:10:34]

Da Kila notes that it wasn’t immediately apparent that RAG would become such a standard paradigm, comparing its initial reception to that of the Transformer paper, which was initially “underwhelmed” at Fair. [00:11:39] He suggests that the success of Transformers is largely due to their optimality for GPUs, rather than their inherent “amazingness.” [00:13:17] The real credit, he believes, should go to the inventors of the attention mechanism. [00:13:34]

Alignment and Reinforcement Learning in AI

Alignment is a crucial problem area focused on making systems maximally useful for end-users. [00:16:01] Reinforcement Learning from Human Feedback (RLHF) was “the secret sauce” behind ChatGPT’s success, allowing models to capture human preferences at the full sequence level. [00:16:16]

Challenges with RLHF

RLHF presents two significant problems:

Reward Model Training: It requires training a separate, often expensive, reward model that is then discarded after training. [00:16:43]
Preference Data Acquisition: It relies on “preference data,” which means human annotation (e.g., thumbs up/down feedback) is needed to correct model outputs. This process is “very slow, very expensive,” and becomes even more so for specialized use cases. [00:17:03]

Contextual AI’s Innovations in Alignment

Contextual AI aims to break these dependencies:

Direct Preference Optimization (DPO): An approach that allows optimization without a separate reward model, making it more efficient. [00:17:32]
Kahneman-Tversky Optimization (KTO): Developed with Stanford student Kavin Singh, this method directly optimizes on feedback without needing explicit preference pairs or data annotation. [00:17:42]
Contrastive Learning from Revisions (CLARE): Addresses the “underspecification problems” in preference datasets. Instead of just “this is better than this one,” CLARE focuses on “contrasting the revisions” (the specific fix for a problem), providing a much tighter and less underspecified signal. [00:18:18]
Anchored Preference Optimization (APO): Accounts for the quality of the model itself in relation to the preference data. If the model is already better than the data, APO ensures that the system learns the ranking without being misled by a lower-quality “right answer.” [00:19:51]

Contextual AI’s internal data annotation team and direct customer feedback mechanisms (like thumbs up/down) allow them to learn effectively using these algorithms. [00:20:24] The speaker emphasizes that “post-training really is where a lot of the magic happens in AI,” transforming a pre-trained model into one capable of specific tasks. [00:20:57] This alignment focuses on specific business use cases rather than general capabilities like Shakespearean sonnets. [00:21:10]

Small MoE Open-Source Model (ALOHA)

Contextual AI, in collaboration with the Allen Institute, released ALOHA, a high-quality, fully open-source Mixture of Experts (MoE) model. [00:22:20] This work aligns with the trend of moving towards smaller models that can be deployed on edge devices. [00:22:47]

Another related innovation is GRIT (Generative Representational Instruction Tuning), which demonstrated that the same model can be used for both the retriever and the generator. [00:22:58] This allows for significant compute caching, as the query encoding for retrieval can be reused for generation, leading to greater efficiency. [00:23:22] The future goal is to combine ALOHA and GRIT to create powerful RAG systems deployable on mobile phones. [00:23:42]

Academia vs. Industry in AI Research

Academia remains “super important for the progress of AI.” [00:24:21] While the scale of pre-training has shifted beyond the reach of most academic institutions compared to five years ago, there’s still significant scope for research in pre-training with smaller models. [00:24:44] The importance of post-training is a “real blessing,” as it allows academics to take pre-trained models (like Meta’s Llama) and conduct “amazing research” in post-training and alignment methods. [00:24:58]

Overhyped vs. Underhyped in AI

Overhyped/Underhyped: Agents. They are currently “overhyped” because they don’t fully work yet, but “underhyped” because they are “showing signs of life.” [00:50:57]
Synthetic Data: Initially skeptical, Da Kila changed his mind, finding synthetic data to be “super powerful” when done correctly and combined with smarter algorithms like KTO and APO. [00:30:45] He refutes the idea that society is “running out of tokens” for training, stating that huge amounts of data are produced daily, and the problem is often data quality, not quantity. [00:33:01]
Multimodality: He believes the field has “not even scratched the surface” with multimodal AI, especially video data, which offers a much richer understanding of the world than text alone. [00:33:52]
Chain of Thought: Initially dismissed as a “cute gimmick,” Chain of Thought approaches have proven to work “really, really well” when combined with other techniques like RLHF for model improvement. [00:31:17]

The Future of AI and Enterprise Adoption

Da Kila sees current AI model training and deployment as moving towards “scaling in many directions” beyond just model size. [00:50:36] This includes sophisticated post-training, “systems thinking” distilled back into models, and improvements in practical aspects like mixture of retrievers. [00:50:14]

He highlights the shift in what an “AI developer” means, moving from machine learning experts to those skilled at calling APIs. [00:29:31] This necessitates new evaluation frameworks that are accessible to developers, rather than relying on traditional machine learning metrics. [00:30:12]

Challenges remain in AI model development and deployment. For instance, getting “off-the-shelf extraction systems” for documents (like PDFs) to work correctly is “very, very hard,” requiring custom solutions. [00:25:48] Similarly, robust evaluation methods for enterprise AI, especially regarding risk and accuracy, are still largely “underexplored.” [00:26:40]

The long-term vision involves AI systems evolving into “multi-agent systems” where humans will also act as agents. [00:45:05] This is already conceptually present in areas like synthetic data generation, where one agent trains another. [00:43:33]

While it’s currently akin to selling “Tesla Roadsters” (high-end, custom-tuned products with dedicated support), the goal is to build the “assembly line for the Model S” – a more accessible, turnkey solution in the future. [00:49:05] The focus remains on delivering “amazing experiences to our customers” and proving tangible ROI from deployments. [00:48:48]

Tubegraph

Explorer

Table of Contents