Development and deployment of Snowflakes Arctic LLM

From: redpointai

Snowflake’s Arctic LLM is a key component of the company’s wide-ranging AI efforts, designed specifically for enterprise needs [00:00:05] [02:01:00].

Development of Arctic LLM

The development of Arctic LLM began around December [00:59:00]. The primary motivation for its creation was to address the needs of Snowflake’s customers, who wanted to build business intelligence (BI) experiences using AI [01:06:00] [01:09:00].

Key objectives for Arctic included:

Excelling at text-to-SQL generation for natural language interaction with data [01:12:00] [01:17:00].
Following instructions exceptionally well [01:23:00].

The development team was relatively small [01:46:00], benefiting from the expertise of researchers who founded DeepSpeed and vLLM [01:37:00] [01:39:00] [01:41:00]. The model was built and released within a three to four-month timeframe [01:49:00] [01:52:00].

Arctic LLM features an innovative and efficient architecture, making it highly efficient for both training and inference [02:27:00] [02:31:00] [02:33:00]. This efficiency allowed the model to be built at 1/8 the cost of similar models available [02:36:00] [02:40:00]. A significant amount of effort was also invested in refining the “data recipe” to ensure optimal performance [02:47:00] [02:51:00].

Unlike general-purpose models, Arctic was designed with specific enterprise use cases in mind, prioritizing capabilities like SQL co-piloting and high-quality chatbot functionalities over tasks such as composing poetry [02:01:00] [02:06:00] [02:07:00].

Deployment and Use Cases

Arctic LLM is deployed and runs within Snowflake’s Cortex, a managed service for large language models [03:04:00] [03:07:00] [03:09:00] [03:11:00]. This service also supports other models from providers like Mistral and Meta [03:19:00].

Snowflake customers primarily use Arctic for three main purposes:

AI for BI (SQL Generation): This involves natural language to SQL conversion to interact with structured data [03:27:00] [03:30:00]. While demos are easy to build, real-world complexity with tens of thousands of tables and complex joins makes this challenging [04:27:00] [04:30:00] [04:36:00] [04:47:00]. Snowflake’s Cortex Analyst product addresses this by stitching together multiple LLMs and self-healing systems that check SQL validity and know when to ask for clarifications or abstain from answering [05:05:00] [05:08:00] [05:10:00] [05:12:00] [05:14:00] [05:16:00] [05:19:00] [05:21:00] [05:22:00] [05:25:00] [05:26:00]. The focus is on precision, even if it means sacrificing recall [07:55:00] [07:59:00]. Quality can reach 90-95% for some use cases, with human-in-the-loop systems and verified queries being implemented for critical applications [05:37:00] [05:40:00] [05:43:00] [05:47:00] [05:51:00] [05:53:00] [05:56:00] [06:06:00] [06:08:00] [06:11:00] [06:14:00] [06:15:00].
Chatbots for Unstructured Data: Building chatbots to interact with unstructured data like documents and PDFs [03:32:00] [03:35:00]. RAG applications are common here, with Snowflake’s Cortex Search providing hybrid search (Vector + lexical keyword) to reduce hallucinations [10:44:00] [10:47:00] [10:49:00] [10:51:00] [11:53:00] [11:56:00] [11:58:00] [12:01:00] [12:02:00] [12:04:00] [12:07:00]. Internal use cases for productivity are common and low-risk regarding hallucinations [12:23:00] [12:24:00] [12:26:00] [12:27:00] [12:29:00] [12:30:00].
Data Extraction & Text Analytics: Using natural language to extract data from text and process it in batch, especially from semi-structured text like sales call logs, customer support tickets, and employee surveys [03:42:00] [03:44:00] [03:46:00] [03:48:00] [03:50:00] [03:53:00] [03:56:00] [03:58:00] [04:00:00] [04:03:00] [04:05:00] [04:07:00] [04:09:00] [04:11:00]. Snowflake aims to make these analyses simple and easy to use through task-specific functions [09:39:00] [09:42:00] [09:44:00] [09:47:00] [09:49:00] [09:51:00] [09:53:00] [09:55:00] [09:57:00] [10:01:00] [10:03:00] [10:06:00] [10:08:00].

Snowflake’s Broader AI Product Portfolio

Snowflake’s broader AI offerings, beyond just Arctic, include:

Cortex: The core inference engine running various large language models, including Arctic [10:29:00] [10:31:00] [10:32:00].
Cortex Search: Focuses on high-quality search, crucial for RAG. It uses Snowflake’s own embedding model, Arctic Embed, which is state-of-the-art, efficient, and a quarter the size of OpenAI’s model while maintaining higher benchmark scores [10:39:00] [10:41:00] [10:44:00] [10:58:00] [11:00:00] [11:02:00] [11:04:00] [11:06:00] [11:08:00] [11:11:00] [11:13:00] [11:16:00] [11:18:00]. Smaller models are a focus due to latency, speed, and cost benefits [11:19:00] [11:20:00] [11:22:00] [11:24:00] [11:26:00] [11:28:00] [11:31:00] [11:33:00] [11:35:00] [11:36:00] [11:39:00] [11:40:00] [11:42:00] [11:44:00].
Cortex Analyst: The BI product for interacting with structured data using natural language [10:49:00] [10:51:00] [10:52:00] [10:54:00].

Data Governance and Deployment Considerations

Snowflake’s inherent data governance capabilities provide a significant advantage for deploying AI models. Running models like Arctic directly next to the data within Snowflake ensures data security and adherence to governance policies [13:10:00] [13:11:00] [13:14:00] [13:16:00] [13:18:00] [13:23:00] [13:24:00] [13:28:00] [13:30:00] [13:32:00] [13:34:00] [13:35:00] [13:38:00] [16:16:00] [16:17:00] [16:19:00] [16:21:00] [16:22:00] [16:26:00] [16:28:00] [16:30:00] [16:31:00] [16:33:00] [16:35:00]. Snowflake provides solutions for customers who prefer to use external models, but its default approach keeps inference within the platform [16:41:00] [16:42:00] [16:44:00] [16:47:00] [16:49:00] [16:51:00] [16:53:00].

Key aspects of governance integration include:

Granular Access Controls: Snowflake has always had granular access controls for database objects, which are extended to AI functionalities [17:02:00] [17:03:00] [17:05:00] [17:07:00] [17:08:00] [17:11:00] [18:01:00] [18:04:00]. For instance, in Cortex Search, access controls are deeply integrated to ensure users only see documents they have permission to access [18:21:00] [18:22:00] [18:25:00] [18:28:00] [18:31:00] [18:33:00] [18:35:00] [18:37:00].
LLM Evaluation and Observability: Snowflake acquired TruEra, which has an open-source product called Trulens, an observability and LLM evaluation platform [14:06:00] [14:10:00] [14:13:00] [14:14:00] [14:18:00] [14:20:00] [14:23:00]. This helps customers evaluate LLM systems at scale using LLMs as judges, easing concerns about production deployment [14:42:00] [14:44:00] [14:46:00] [14:49:00] [14:51:00] [14:54:00] [14:56:00] [14:57:00].
Confidence in Production: While internal use cases are transitioning from proofs-of-concept (POCs) to production, external use cases still require more confidence due to concerns about hallucination, especially in regulated industries [15:00:00] [15:03:00] [15:05:00] [15:07:00] [15:09:00] [15:11:00] [15:13:00] [15:15:00] [15:17:00] [15:19:00] [15:22:00] [15:24:00] [15:25:00] [16:06:00] [16:08:00] [16:10:00] [16:12:00].
Challenges in Enterprise AI Deployment: Key concerns include hallucinations, immature measurement tools, and quality for complex scenarios like text-to-SQL [28:05:00] [28:07:00] [28:11:00] [28:14:00] [28:18:00] [28:20:00] [28:23:00].

Model Selection and Future Direction

Snowflake recommends starting with large models and RAG solutions for POCs [19:23:00] [19:25:00] [19:27:00] [19:31:00]. Once a system is in place, fine-tuning smaller models is suggested for latency and cost optimization in production [19:36:00] [19:38:00] [19:40:00] [19:42:00] [19:43:00] [19:45:00]. For companies with large, unique datasets, especially in regulated industries like healthcare with specialized language, custom model training can be beneficial to ensure full control over data input [19:49:00] [19:52:00] [19:54:00] [20:00:00] [20:03:00] [20:05:00] [20:07:00] [20:08:00] [20:10:00] [20:12:00] [20:17:00] [20:19:00] [20:22:00] [20:24:00] [20:26:00] [20:28:00] [20:31:00] [20:34:00] [20:36:00] [20:39:00].

The future of Arctic LLM is not aimed at becoming a general-purpose model competing with models like GPT-5, but rather to continue focusing on Snowflake customers’ specific needs, particularly in SQL generation and RAG quality [26:17:00] [26:20:00] [26:22:00] [26:24:00] [26:27:00] [26:30:00] [26:32:00].

AI Infrastructure and Innovation

Snowflake continuously updates its inference stack to support new models like those from Mistral and Meta [22:20:00] [22:22:00] [22:23:00] [22:26:00] [22:28:00] [22:30:00]. This is facilitated by the integration of vLLM founders into the Snowflake team, enabling optimizations for large models and multi-node inference [22:32:00] [22:36:00] [22:38:00] [22:40:00] [22:43:00] [22:44:00] [22:47:00] [22:49:00] [22:51:00] [22:56:00] [22:59:00] [23:03:00] [23:06:00] [23:09:00] [23:12:00] [23:13:00] [23:16:00] [23:18:00] [23:20:00] [23:24:00].

Innovations in the inference stack directly translate to cost reductions [35:52:00] [35:55:00] [35:57:00] [36:00:00] [36:02:00] [36:03:00] [36:05:00]. While costs have been decreasing, the current internal focus of many use cases means high volumes are not yet creating significant cost blockers [25:40:00] [25:43:00] [25:45:00] [25:54:00] [25:56:00] [25:58:00].

Tubegraph

Explorer

Table of Contents