From: aidotengineer
Building web research agents like Gemini Deep Research presents unique challenges, particularly when integrating long-running, asynchronous tasks into an inherently synchronous chatbot product [00:02:22].
Core Challenges
Synchronous Nature of Chatbots
Traditional chatbots are designed for quick, synchronous interactions, where a response is expected almost immediately after a query [00:02:22]. Deep research, however, can take several minutes to complete [00:02:01], requiring a fundamental shift in how the user experience is managed [00:02:28].
Setting User Expectations
It’s crucial to differentiate research queries, which benefit from longer processing times, from simple queries (e.g., weather, jokes) where a five-minute wait would be unacceptable [00:02:34]. Users need to understand that the product is performing a different kind of task [00:02:45].
Handling Long Outputs
Research reports can be thousands of words long [00:02:47]. Designing an interface that allows users to easily engage with and navigate such extensive content within a chat experience is vital [00:02:51].
Solutions Implemented by Gemini Deep Research
The Research Plan Card
To manage user expectations and introduce the asynchronous nature of Deep Research, Gemini first presents a “research plan” in a card format [00:03:20]. This step communicates that the experience is different from a standard chatbot interaction [00:03:28]. It also allows users to review, edit, and steer the direction of the research, much like collaborating with an analyst [00:03:37].
Real-time Progress Transparency
Once the research begins, Gemini provides real-time transparency by showing the websites it is browsing [00:03:56]. This feature, developed before “thinking models” and “thoughts” became common, helps users understand what the model is doing under the hood [00:04:03]. Users can click through the websites and explore the content while waiting [00:04:12].
Engaging with Long Reports via Artifacts
Inspired by Anthropic’s “artifacts,” Gemini allows users to “pin” the generated long report [00:04:37]. This enables users to ask follow-up questions about the research directly while reading the material without needing to scroll back and forth [00:04:43]. This design makes it easier to change the report’s style, add/remove sections, or ask further questions [00:04:52].
Source Attribution and Trust
To build user trust and respect publishers, Deep Research always displays all sources read and used in the report [00:05:03]. Even sources read but not directly used in the final report remain in context for potential follow-up questions [00:05:14]. This information can also be exported to Google Docs with citations [00:05:19].
Technical Considerations for Asynchronous Agents
Robustness to Failures
Long-running tasks, which involve many LLM calls and interactions with various services, are prone to failures [00:06:21]. For a research agent that can take minutes or even hours [00:06:27], it’s critical to be robust to intermediate failures [00:06:36]. This requires:
- State Management: Building a strong state management solution [00:06:44].
- Error Recovery: Effectively recovering from errors to avoid dropping the entire research task due to a single failure [00:06:46].
Cross-Platform Continuity
The asynchronous nature allows users to initiate a research task, walk away, and then be notified when it’s complete [00:07:03]. This capability enables cross-platform access, allowing users to pick up and read the report on different devices [00:07:11].
Iterative Planning with Partial Information
The model must iteratively plan its actions, deciding which sub-problems to tackle in parallel versus sequentially [00:07:40]. It frequently operates with partial information and must constantly re-evaluate its next steps based on new findings [00:07:58]. This means grounding its next steps on the information it has found so far [00:08:24]. For example, if it finds D1 scholarship standards, it then needs to plan to find D2 and D3 standards [00:08:09].
Managing Context Growth
As the research progresses and streams of information are gathered, the context size can grow very quickly [00:10:49]. This is further compounded by follow-up queries or requests to research related topics [00:11:04]. Even with long context models like Gemini, effective context management strategies are essential [00:11:20]. One approach involves a recency bias, where more information is kept about current and previous tasks, while older tasks are selectively summarized into “research notes” that the model can still access through a retrieval-augmented generation (RAG) system [00:11:37].
Future Directions for Research Agents
The success of systems like Deep Research opens doors for future advancements in AI agents:
- Deeper Expertise: Moving beyond aggregating and synthesizing information to provide expert-level insights, implications, and novel patterns, akin to a McKinsey partner or Goldman Sachs partner [00:12:42]. This could involve complex tasks like forming hypotheses in scientific research [00:13:08].
- Personalized Experience and Presentation: Tailoring information delivery based on the user’s role and needs [00:13:22]. For instance, a due diligence report for a general user would differ significantly from one for a Goldman Sachs banker, influencing how the web is browsed and the answer is framed [00:13:31]. This speaks to the broader concept of building user experiences with AI.
- Multimodal Capabilities: Combining web research with other AI capabilities like coding, data science, and even video generation [00:14:11]. This could involve an agent performing statistical analysis or building financial models to inform its research output [00:14:21].