From: redpointai
Jonathan Frankle, Chief AI Scientist at Databricks (joining via the Mosaic acquisition), shares insights on the integration of AI and data platforms in enterprises. His discussions cover how businesses can strategically adopt AI, from model selection to deployment and evaluation, drawing lessons from Databricks’ extensive customer base [00:00:00].
Databricks and Mosaic: A Synergistic Acquisition
Databricks acquired Mosaic due to a natural synergy: Mosaic had a strong AI platform, and Databricks had a robust data platform [00:05:41]. AI fundamentally requires data, making the combination logical [00:05:43]. The leadership teams, composed largely of academics and scientists, shared a common understanding, facilitating the merger despite initial reluctance from both sides [00:06:26]. The initial discussions that led to the acquisition reportedly took place at the Cal Valley Conference [00:06:46].
Enterprise AI Adoption Strategies
Enterprises often struggle to determine the best approach for integrating AI: training custom models, fine-tuning, or simply using prompt engineering [00:00:08]. Frankle advises keeping options open and starting small, gradually scaling up based on rigorous ROI justification [00:07:56].
Phased Approach to AI Adoption
- Start with Prompting: Begin by simply prompting a model (e.g., OpenAI or Llama on Databricks) to litmus test if AI is suitable for a given use case [00:08:21].
- Implement RAG (Retrieval-Augmented Generation): If initial prompting shows promise, leverage RAG to incorporate proprietary enterprise data, as generic models cannot inherently know about internal data [00:09:27]. Databricks refers to this as “data intelligence” [00:09:41].
- Fine-tuning: If RAG yields value, fine-tuning can bake data into the model, improving quality and reducing inference costs [00:09:50].
- Continued Pre-training / Full Pre-training: For extensive usage and specific needs, further pre-training can lead to highly specialized and cost-effective models, though this is a significant undertaking [00:10:02].
Frankle stresses an agile approach: do not wait for “perfect” data or evaluations [00:10:17]. The utility of data and evaluations is measured by the success of the AI system in real-world scenarios [00:10:24].
Evaluation and Benchmarking
Evaluating AI models is crucial but challenging [00:11:11]. Frankle advises starting with simple, even imperfect, benchmarks and iteratively improving them [00:11:22]. Human testing, even from a single external person, provides invaluable real-world feedback compared to synthetic benchmarks [00:11:31].
Databricks has released a new “agent eval” product, aimed at helping users create meaningful evaluation sets quickly, ideally in an afternoon, to measure their models effectively [00:13:30]. This product is now in public preview [00:14:18].
Databricks’ Internal AI Journey with dbrx
Databricks utilized its own comprehensive platform to build its dbrx language model, showcasing the full enterprise AI integration capabilities [00:14:39]. They used:
- Spark: For data ingestion and ETL, significantly reducing processing time from weeks to minutes [00:14:46].
- Delta tables: For data storage [00:14:50].
- Unity Catalog: For tracking and managing datasets, proving “incredibly useful” [00:14:52].
- MLflow: As their experiment tracker, noting its phenomenal and free nature [00:15:31].
- Mosaic inference service: For model serving [00:15:44].
Use Cases for Domain-Specific Models
While large general models are advancing, there are specific scenarios where domain or company-specific models offer significant advantages:
- Language/Cultural Specificity: Models not well-tuned for non-English languages (e.g., Japanese, Korean) due to tokenizer issues or data scarcity [00:16:26]. Companies like NTS and Ola have built impressive models for their respective regions (Japan, India) [00:17:04].
- Fundamentally Different Tasks: For tasks vastly different from general language understanding, like protein modeling, specialized models are necessary [00:17:13].
- Speed and Specificity: Applications requiring extremely fast and highly specific responses, such as code completion tools (e.g., Replit), benefit from models built for speed and tailored to the task [00:17:27].
- Cost Optimization: For models with high usage, the upfront investment in pre-training can lead to substantial long-term cost savings or significantly improved quality at the same cost, by shifting the cost-quality trade-off [00:17:58].
AI Product-Market Fit Patterns
AI has found strong product-market fit in two main patterns:
- Brainstorming and Creative Applications: Scenarios where “being right” is not strictly required, and multiple correct answers exist, such as creative tasks, marketing, or general information surfacing (e.g., Glean) [00:19:46].
- Human-in-the-Loop Validation: Cases where generating a proposed answer is costly for a human, but checking it is relatively quick (analogous to P vs. NP problems in complexity theory) [00:20:12]. Examples include coding co-pilots like GitHub Copilot and potentially customer support, where AI can draft responses for human review [00:21:02].
Future of AI Development: Quality vs. Fuzziness
While chaining models and agents can improve quality by running more tokens through the model in creative ways, it pushes along the cost-quality curve but doesn’t eliminate the fundamental “fuzziness” of AI systems [00:23:41]. Achieving “perfection” or “five nines of quality” remains challenging with current generation technology [00:24:12]. This means continuous learning is required to leverage AI’s strengths and mitigate its weaknesses, similar to the decades-long journey of developing software engineering practices [00:24:56]. Even without further technological breakthroughs, learning to use existing tools better will lead to enormous creativity [00:25:28].
AI and Societal Comfort: Policy and Philosophy
The discussion touches on the broader societal implications of AI, particularly in high-stakes fields like healthcare and autonomous vehicles [00:25:25].
- Trust and Explainability: Human intuition about human errors differs from AI errors, making AI mistakes harder to accept, especially when the reasoning is unpredictable [00:26:08]. Building trust requires transparency and understanding where AI systems fail [00:26:50].
- Setting Standards: AI’s capabilities challenge us to reconsider the standards we hold humans to in certain tasks. For example, automated facial recognition systems highlighted human fallibility, prompting a re-evaluation of human performance standards [00:27:31]. Similarly, autonomous vehicles push us to rethink driver’s tests [00:28:36].
- Responsible Policy: AI practitioners have a responsibility to participate in policy conversations, not just for self-interest, but to ensure AI is used responsibly [00:53:00]. This requires scientific honesty, clear communication of biases, and earning trust [00:54:51].
- Contextual Regulation: Blanket regulation of AI is not ideal; instead, careful, application-by-application consideration is needed [00:56:24]. In high-stakes areas like law enforcement, medicine, and autonomous vehicles, extraordinary caution is necessary due to the potential for severe consequences [00:57:05].
AI Infrastructure Landscape and Databricks’ Role
Databricks aims to provide an end-to-end platform for AI, integrating tools for data, model training, and evaluation [00:30:09]. This involves partnering with numerous startups and even acquiring companies like Lilac, whose tools were highly valued internally [00:33:31]. The goal is to provide customers with a complete, well-integrated set of tools without needing to cobble together disparate systems [00:30:35].
There is significant investment in large open-source models by major companies like Meta (Llama models) [00:35:54]. Databricks focuses its efforts on other gaps in the ecosystem, such as eval creation, navigating the space of fine-tuning versus RAG, and helping customers connect their diverse data (even imperfect data) to build AI systems [00:37:09]. This includes exploring “compound AI systems” and agents to connect different pieces [00:38:05].
Reflections on AI Progress
Frankle expresses humility about predicting the future of AI, admitting past incorrect predictions, such as underestimating GPT-3 [00:39:51]. He emphasizes that the field works in “big leaps” followed by consolidation [00:42:04].
Recent Developments in AI Research
- 01 Model (OpenAI): Frankle finds it exciting and acknowledges OpenAI’s impressive engineering achievement in scaling up existing ideas [00:43:08]. Its impact will be determined by its ability to generalize beyond constrained, mathematical problems [00:42:50].
- Anthropic’s Computer Use Work: This work is considered an important application, indicative of a new phase of experimentation in AI products. Frankle admires Anthropic’s creativity and willingness to take risks [00:46:01].
Academia’s Role in AI Research
Academia plays a vital role by “zagging” when industry “zigs” [00:49:20]. Academics can ask difficult questions that companies might avoid, build critical benchmarks and leaderboards, and take risks on new technologies and models [00:49:27]. Key areas for academic focus include human-AI interaction (HCI), data research, and product development [00:50:00].
Data Labeling Market
Data annotation is “really important” for AI [00:51:18]. The challenge lies in efficiently turning human labor into high-quality data annotations, requiring significant expertise and trust in the service providers and their supply chains [00:51:29].
Future of AI Applications and Jonathan Frankle’s Role
Frankle finds his current role at Databricks fascinating because he is at the “nexus” of science, policy, and society, witnessing how AI is being applied to “most economically useful tasks” by Databricks’ 12,000 customers [01:00:49]. He is excited about the potential of embodied systems and robotics, though acknowledging the inherent risks and long development cycles [00:59:19].
Databricks’ commitment as “the data and AI company” means AI is integrated into every job function [01:02:26]. Frankle invites interested individuals to explore Databricks products like the AI Model Gateway, Agent Platform, Agent Evaluation product, and new fine-tuning techniques (e.g., “soft” fine-tuning for fragmented data) [01:02:49].