Notion AI development and team structure

From: redpointai

Notion has quickly integrated advanced AI features into its product, leveraging large language models (LLMs) and generative AI to enhance user experience [00:21:13]. The company’s approach involves a blend of user-centric problem-solving and proactive exploration of new AI capabilities [00:06:51].

Origin and Product Evolution

Notion’s journey into AI began around late 2022. During a company offsite, CEO Ivan Zhao and co-founder Simon Last took a few days to experiment with GPT-3, recognizing its potential as a writing tool [00:03:00]. This hackathon-style effort led to the first version of Notion AI [00:03:30].

Key Notion AI features include:

Notion AI Writer Launched in beta in November 2022 and released in February 2023, this feature helps users write, summarize pages, extract key ideas or action items, fix spelling and grammar, and improve writing style. It supports iterative refinement, allowing users to conversationally adjust generated content (e.g., “make it shorter” or “more punchy”) [00:03:32].
AI Autofill Released in May 2023, this feature integrates AI into Notion databases, often used for project management. It can automatically fill entire columns or properties, such as extracting key topics from meeting notes or core user needs from interview transcripts [00:04:39].
Notion Q&A A newer feature, Notion Q&A understands the entire workspace, allowing users to ask questions and receive answers based on information across multiple pages [00:05:23]. This addresses the common problem of finding information in increasingly complex Notion workspaces [00:06:01].

AI Development Process

Notion’s AI team operates on an “expand out and then contract” cycle [00:07:30].

Exploration: They start with broad problem statements, such as helping users find information or organize content. Motivated by these, they prototype quickly using promising technologies like retrieval-augmented generation (RAG) [00:07:39].
Dogfooding: Notion heavily “dogfoods” its products internally. Prototypes are used by the entire company, providing immediate feedback and forcing rapid iteration on output quality. This continuous exposure helps validate usefulness and adds pressure to improve [00:10:02].
Refinement: Based on internal usage and external early testers (Notion ambassadors, partners), they identify promising approaches and recalibrate, leading to a more defined solution [00:08:11].
Productization: Once a clear direction is established, the process becomes more like traditional product building, involving user research and designers to ensure a confident user experience and model quality [00:11:41].

User-driven adaptation

Initial features are often extended by users in unanticipated ways. For instance, Autofill was heavily used for translation outside the US, leading Notion to build translation as a native, pre-built prompt [00:18:30]. This iterative process of discovering common use cases via custom prompts and then baking them into templates is a core Notion pattern [00:19:05].

AI Team Structure

The Notion AI team consists of about a dozen people [00:12:05]. It’s roughly split into two halves:

Data Model & Quality: Focuses on the correctness and coherence of model outputs [00:12:09].
Product Concerns: Works on the user interface and integration into Notion [00:12:20].
Design Resources: The team also collaborates closely with a couple of designers who help present model outputs in user-friendly ways [00:12:41].

The ownership of AI features is evolving. Currently, the core AI team owns features with a specific AI surface (like AI Writer and Q&A). However, features deeply integrated with other Notion products, like Autofill with databases, are owned by those respective product teams, though close collaboration with the AI team continues for complex AI problems [00:13:37]. This fluid structure reflects the ongoing process of determining the foundational pieces of Notion AI and how they intersect with existing product areas [00:14:10]. It’s an open question whether AI engineers will eventually be embedded in every team, or if a centralized infrastructure team will remain essential for quality, monitoring, and data management [00:15:56].

Challenges in AI Development

Developing AI features like Q&A presents unique challenges, particularly regarding evaluation [00:23:14].

Correctness and Quality: For features like Q&A, precise answers are critical, making evaluation much more black and white compared to creative writing tasks where there’s more “wiggle room” [00:23:48].
Anticipating User Questions: It’s challenging to anticipate the full range of user questions, especially “meta” questions about Notion itself (e.g., “How can I share this page with Jack?”) or time-sensitive queries (e.g., “What is the marketing team working on this week?”) that require sophisticated understanding beyond a simple document lookup [00:24:19].
Operational Concerns: Addressing customer needs around privacy and security, as well as managing the necessary scale for AI features, required starting from first principles due to a lack of clear industry answers [00:25:33].

Tooling and Evaluation

Notion has primarily built its own internal tools for working with language models [00:26:21]. This decision was driven by the early stage of the industry when few suitable tools existed, and by the unique complexity of Notion documents (rich text, tables, images, metadata) which don’t map well to simpler formats like PDFs or Wikipedia pages [00:26:40]. In-house tools also allow for faster iteration and customization (e.g., comparing more than two models side-by-side) [00:27:24].

Evaluation occurs across a spectrum:

Deterministic Programmatic Evaluations: Automated checks of model outputs [00:27:58].
Human Annotators: A team reviews outputs to speed up certain processes [00:28:38].
ML Engineers’ Qualitative Review: ML engineers deeply examine model outputs to understand why a model is failing (e.g., misinterpreting instructions, struggling with relative dates vs. absolute dates). This costly but high-payoff approach helps identify where in the pipeline to intervene (e.g., embeddings, ranking, or the answering component) [00:28:44].

Model Strategy

Notion has strong partnerships with Anthropic and OpenAI, relying on them for foundational model building and scalable hosting [00:31:08]. Notion’s role in this partnership is to:

Understand Tasks: Deeply understand the specific tasks its models need to perform [00:31:40].
Data Set Curation: Collect or generate synthetic data for those models. Notion has committed to not training on customer data [00:31:46]. This limitation has pushed them to systematically understand Notion workspace archetypes and document structures to create high-quality synthetic data [00:32:06].
Evaluation: Define clear criteria for evaluating task performance (e.g., what makes a good summary for a meeting note versus a technical document) [00:32:40].

Notion iterates on the full stack, experimenting with different pipeline stages (e.g., rephrasing retrieved passages) [00:29:30]. They decide which model to use based on capabilities, cost, and throughput requirements, potentially using different models or providers for different features [00:33:49]. While open-source models are explored for areas like embeddings, Notion hasn’t shipped production features with them yet [00:33:16].

User Interaction and Interface Design

Notion emphasizes providing users with the right “building blocks” and abstractions for AI and user interface customization. For every AI feature, users can access a direct prompting interface [00:40:00]. However, Notion also provides pre-built prompts to guide users and overcome the “blank canvas problem” [00:40:08]. The most popular use cases are often driven by these pre-built options (e.g., summarization, grammar fixes), but users frequently iterate on the outputs to fit specific needs [00:19:23]. Power users often hand-write and reuse custom prompts [00:20:20].

Notion is exploring whether AI can output interactive UI elements (generative interfaces) or perform actions, which would require defining new “blocks” or a domain-specific language for AI outputs [00:41:18].

AI Team Philosophy and Learnings

Betting on Generality: Notion has been surprised by how well general approaches work, suggesting that investing in a single model that can perform multiple tasks (even by combining data sets for different tasks) may yield better long-term outcomes and a deeper understanding of the product’s domain [00:46:14].
Iterate Quickly: In the early phases of development, rapid iteration is key, especially when dealing with AI outputs where quality and usefulness need constant refinement [00:11:11].
“Garden Approach” to Innovation: Similar to places like Midjourney or GitHub Next, Notion’s environment allows individuals to champion specific ideas and push prototypes forward rapidly. This “garden” of different people trying different things can yield productive outcomes [00:48:29].

The development of compound AI systems is a key focus, as getting foundational components like retrieval right can augment many capabilities beyond just question answering [00:14:52]. While long context windows are currently “overhyped,” robust retrieval and effective data filtering will always be essential [00:42:24]. The exploration of alternative architectures to Transformers is “underhyped,” as more efficient and effective sequence modeling approaches may emerge [00:44:36].

Tubegraph

Explorer

Table of Contents

Notion AI development and team structure

Origin and Product Evolution

AI Development Process

AI Team Structure

Challenges in AI Development

Tooling and Evaluation

Model Strategy

User Interaction and Interface Design

AI Team Philosophy and Learnings

Graph View

Backlinks