Conducting deep research with AI SDK

From: aidotengineer
The AI SDK enables developers to create sophisticated applications, including those capable of deep research. A deep research clone can process an initial query, generate sub-queries, search the web for relevant information, analyze findings, and recursively follow up on new questions to build a comprehensive report [00:25:00]. This process leverages the AI SDK’s capabilities for text generation, tool use (function calling), and structured data output [00:25:22].

AI SDK Fundamentals for Building Agents

Building agents with the AI SDK involves several core primitives that provide flexibility and power:

Generate Text Function The generateText function allows calling a large language model to produce text [00:01:06]. It accepts either a simple prompt or an array of messages with specified role and content [00:02:16].
Unified Interface A key feature of the AI SDK is its unified interface, allowing developers to switch between different language models by changing a single line of code [00:02:41]. This flexibility is useful for optimizing costs, speed, or performance for specific use cases [00:02:50]. For example, seamlessly transitioning from OpenAI’s GPT-4o mini to Perplexity’s Sonar Pro or Google’s Gemini Flash 1.5, especially when web search capabilities are needed [00:03:53]. The SDK also provides access to sources used by models like Perplexity [00:04:48].
Using Tools and Function Calling Tools, or function calling, enable language models to interact with the outside world and perform actions [00:06:22].
- Mechanism: The model is given a prompt and a list of available tools, each with a name, description, and required parameters [00:06:29]. Instead of generating text, the model might generate a tool call, including the tool’s name and arguments parsed from the conversation context [00:06:58]. The developer then executes this tool call [00:07:13].
- Implementation: Tools are passed to generateText or streamText via a tools object. The tool utility function provides type safety between defined parameters and the arguments passed to the execute function [00:08:06]. The AI SDK automatically parses tool calls, invokes the execute function, and returns the result in a toolResults array [00:09:23].
- Autonomous Multi-step Agents (maxSteps): To allow the model to incorporate tool results into a text answer, the maxSteps property can be set [00:11:35]. If a tool call is generated, the tool result and previous context are sent back to the model, triggering another generation. This process continues until plain text is generated or the maximum number of steps is reached, enabling autonomous, multi-step agent behavior [00:12:03]. The model can even perform parallel tool calls [00:16:42].
Generating Structured Data (Structured Outputs) The AI SDK offers two ways to generate structured outputs:
- generateText with experimental_output: This option allows defining a schema (e.g., using Zod) for the expected output structure [00:18:49]. Zod, a TypeScript validation library, pairs well with the AI SDK for defining schemas and making structured outputs easy to work with [00:19:48].
- generateObject Function: This function is specifically designed for structured outputs [00:18:55]. It takes a prompt and a schema, returning a type-safe object [00:22:09]. A useful feature is .describe() in Zod, which allows providing detailed context for specific values in the schema, influencing the model’s output without altering the main prompt [00:23:14]. generateObject can also operate in enum mode for scenarios with a limited set of discrete values (e.g., “relevant” or “irrelevant”) [00:40:15].

Building a Deep Research Clone

The deep research workflow involves several sequential and recursive steps to gather and synthesize information:

Workflow Overview

Generate Sub-queries: Convert a broad initial query into specific search queries [00:26:58].
Search the Web: Find relevant results for each sub-query [00:27:17].
Analyze Results: Extract learnings and identify follow-up questions [00:27:20].
Recursive Follow-up: Take follow-up questions, generate new queries, and repeat the process while accumulating research [00:27:31].
Synthesize Report: Aggregate all gathered information into a comprehensive report [00:27:59].

The process uses depth to control how many levels deep the research goes and breadth to determine how many different inquiries are pursued at each step [00:29:06].

Step 1: Generating Search Queries

The generateSearchQueries function uses generateObject to create a list of search-engine-optimized queries from a given prompt [00:29:50]. The function specifies a schema for an array of strings to ensure the output is structured as expected [00:30:38].

Step 2: Searching the Web

The searchWeb function utilizes the Exa API to search the web for relevant results [00:32:53]. Key considerations include:

Result Count: Limiting the number of results to optimize token usage [00:34:04].
Live Crawl: Using liveCrawl to ensure results are current, accepting a potential hit on performance [00:34:25].
Trimming Information: Only returning necessary data (e.g., title, URL, content) from search results to reduce token count and improve model effectiveness by removing irrelevant information like favicons [00:34:50].

Step 3: Analyzing Results and Iterating (Agentic Loop)

The searchAndProcess function implements an agentic loop using generateText with two tools:

searchWeb: Executes the web search for a given query [00:38:57]. Search results are pushed to a pendingSearchResults array [00:39:24].
evaluate: Determines the relevance of the most recent search result [00:39:41]. It uses generateObject in enum mode to classify results as “relevant” or “irrelevant” [00:40:15]. If irrelevant, it provides feedback to the model to search again with a more specific query, perpetuating the maxSteps loop [00:40:56]. This tool also prevents re-using sources that have already been processed [00:52:54].

Step 4: Generating Learnings and Follow-up Questions

The generateLearnings function uses generateObject to analyze a relevant search result [00:43:41]. It extracts a learning (insight) and identifies followUpQuestions, both defined by a Zod schema [00:44:15]. The search result is passed as stringified XML within the prompt to provide context [00:44:08].

Step 5: Implementing Recursion and State Management

The deepResearch function orchestrates the recursive nature of the research process:

It takes an initial prompt, and recursively calls itself with new queries derived from follow-up questions [00:46:13].
depth and breadth parameters control the extent of the recursive search [00:49:09].
A global or external accumulatedResearch state object stores all gathered information (original query, active queries, search results, learnings, completed queries) across recursive calls [00:48:10].
The function decrements depth with each recursive call to ensure termination and prevent infinite loops, conserving API credits [00:50:01].

Step 6: Generating the Final Report

Once the recursive research is complete, the generateReport function takes the accumulatedResearch and uses generateText to synthesize all information into a comprehensive report [00:54:39].

Model Selection: A reasoning model like GPT-3.5 Turbo mini can be used for synthesis [00:54:57].
System Prompt: A detailed system prompt is crucial for guiding the model on the desired report structure and persona (e.g., “expert researcher”), including markdown formatting, today’s date, and general research analyst-oriented guidelines [00:57:22]. This ensures a high-quality, structured output, minimizing the need for the model to infer formatting or tone [00:56:51].

This comprehensive workflow, built with the AI SDK, demonstrates how complex AI systems can be constructed by combining fundamental building blocks and leveraging agentic behavior to tackle challenging tasks like deep research [00:59:08].

Tubegraph

Explorer

Table of Contents