Comparisons and partnerships with AI model providers

From: redpointai

Notion, a company known for quickly shipping AI features into its product, has extensively leveraged partnerships with leading AI model providers like OpenAI and Anthropic. This approach is central to how Notion develops and integrates artificial intelligence into its platform [00:24:01].

Strategy for AI Model Development

Notion’s strategy for AI model development is primarily focused on understanding and defining the specific tasks their models need to perform, and then partnering with external providers for the foundational model infrastructure [00:31:42].

Core Focus: Task Understanding and Data Generation

Notion’s team concentrates on:

Understanding tasks: Defining what makes a “good summary” can vary significantly between a meeting note, a long technical document, or a bug report [00:32:45].
Data collection and generation: This involves generating synthetic data for models and building data sets for evaluating specific tasks, particularly given the commitment not to train on customer data [00:31:46]. This approach helps them understand how Notion workspaces are organized and what kinds of documents are prototypical for AI service [00:32:23].

Partnering for Infrastructure

Notion recognizes the difficulty of competing at the infrastructure level with companies like OpenAI, Anthropic, or Google [00:31:22].

Division of Labor: Partners are responsible for building effective models that follow instructions and hosting them scalably and reliably [00:31:31].
Beneficial for Ecosystem: The continuous reduction in model costs by large foundation model companies like OpenAI is seen as beneficial for the startup ecosystem, allowing companies like Notion to build on top of their services without significant infrastructure investment [00:56:06].

Proprietary vs. Third-Party AI Models

Notion’s approach involves a clear preference for leveraging external, state-of-the-art models from their partners rather than building their own foundation models from scratch [00:31:00].

Benefits of Partnerships

Access to State-of-the-Art: Notion has strong partnerships with Anthropic and OpenAI, benefiting from their expertise in infrastructure and initial model building [00:31:08].
Scalability and Reliability: Partners manage the complex task of hosting models in a reliable and scalable manner [00:31:35].
Focus on Application Layer: This allows Notion to focus on understanding specific tasks, evaluating model performance, and designing user experiences [00:33:04].

Considerations for Model Selection

The best model for a particular feature depends on several factors:

Performance: The model must meet performance expectations [00:33:53].
Throughput: Some features, like AI autofill which runs in the background when a page changes, require models that support higher throughput [00:33:59].
Cost: Notion evaluates the cost of different models to find the most suitable option [00:34:18]. This often means using different model scales or even different providers for different features [00:34:25].

Open Source Models Versus Proprietary AI Models

While Notion is exploring open-source models, especially for embedding tasks, they have not yet shipped any production features using them [00:33:10]. Their current approach primarily relies on proprietary models from their partners.

Managing Model Interactions and Outputs

Notion employs extensive pre-processing and in-house tools to manage how their products interact with and evaluate the outputs from third-party AI models.

Prompt Engineering and Pre-processing

User prompts are often wrapped in Notion’s own processing, which may include prompt templates that incorporate dialogue history or context from the page [00:34:54].
For features like Q&A, a query rewrite phase might occur where the model rephrases the query based on conversation context before searching [00:35:37].
Prompt engineering involves understanding specific task criteria (e.g., what makes a good summary) and then instructing the model to meet those criteria, including format requirements and what to avoid [00:37:27]. These core instructions tend to carry over between different models [00:38:29].

Evaluation Methodologies

Notion has built most of its evaluation tools in-house [00:26:28]. This is due to the lack of suitable off-the-shelf tools when they started and the complexity of Notion’s rich, structured documents [00:26:31].

Spectrum of Evaluation:
- Programmatic/Deterministic: Automated checks of model outputs [00:27:58].
- Human Annotators: A team of human annotators helps to speed up processes [00:28:38].
- ML Engineer Review: ML engineers manually review model outputs to understand why failures occur (e.g., model misunderstanding instructions, difficulty with relative dates) [00:28:46]. This deep dive into errors is crucial for identifying where to intervene in the pipeline (e.g., embedding problem vs. answering problem) [00:30:37].

Multilingual Performance

Large models generally transfer well across languages, though performance might excel in English and fall off in other languages [00:38:36]. Notion prototypes with English and then uses specific evaluation datasets for multilingual performance, sometimes adding few-shot examples or training to bolster performance [00:39:06].

Example: Notion Q&A can read documents in Japanese and translate the answer back to the user without an intermediate translation layer [00:39:21].

Conclusion

Notion’s approach to AI development highlights the significance of strategic partnerships with foundational model providers. By outsourcing the complex, resource-intensive task of model building and hosting, Notion can focus its internal efforts on understanding specific user needs, refining interaction patterns, and building robust evaluation frameworks. This allows them to iterate quickly and adapt to the evolving landscape of AI capabilities, delivering tailored and high-quality AI features to their users.

Tubegraph

Explorer

Table of Contents