From: aidotengineer
This section focuses on building a deep research clone using the AI SDK in Node.js, demonstrating how to construct complex AI systems by combining various AI SDK functions and integrating with external tools [00:25:24]. The project aims to take a user query, conduct deep research by searching the web, aggregate findings, and compile them into a markdown report [00:24:52].
Project Workflow Overview
Deep research products, like those offered by OpenAI or Google’s Gemini, typically take a topic, search the web, aggregate resources, and return a comprehensive report [00:26:12]. This project’s workflow is broken down into several autonomous agentic steps:
- Input Query: Start with a user-provided prompt or rough query [00:26:53].
- Generate Subqueries: Based on the initial prompt, generate a list of specific search queries [00:26:58].
- Search the Web: For each subquery, search the web for relevant results [00:27:15].
- Analyze Results: Analyze the search results to extract key learnings and identify follow-up questions [00:27:19].
- Recursive Research: If necessary, take the follow-up questions and existing research to generate new queries, recursively repeating the process to explore topics in depth [00:27:26]. This allows the system to go down “webs of thought” and accumulate a comprehensive set of information [00:27:47].
- Depth and Breadth: The
depth
setting controls how many levels deep the research goes, whilebreadth
dictates how many different lines of inquiry are pursued at each step [00:28:06].
- Depth and Breadth: The
- Generate Report: Synthesize all the accumulated research into a final markdown report [00:26:26].
Implementation Details
The project utilizes the AI SDK’s capabilities for prototyping and production in AI, especially its generateObject
and generateText
functions, along with ZOD for schema definition [01:52:05].
Project Setup
To follow along with the implementation:
- Clone the repository [00:35:00].
- Install dependencies [00:38:00].
- Copy environment variables [00:40:00].
- Run the
index.ts
file usingpmpp rundev
(orpd
as an alias) [00:48:00].
1. Generating Search Queries
The initial step involves taking a broad user query and generating more specific search queries for a search engine [02:50:00].
generateSearchQueries
function:- Takes a
query
(string) andnumberOfSearchQueries
(defaulting to 3) [02:56:00]. - Uses
generateObject
with amainModel
(e.g., GPT-4o mini) [03:08:00]. - The prompt instructs the model to generate
n
search queries for the given input query [03:30:00]. - The output schema is an array of strings (
Z.array(Z.string())
) with a minimum of 1 and maximum of 5 items, though the default is 3 [03:38:00]. - Example for “what do you need to be a D1 shotput athlete?”: “requirements to become a D1 shotput athlete”, “training regiment for D1 shotput athlete”, “qualifications for NCAA division one shot put” [03:27:00].
- Takes a
2. Web Search with Exa
For searching the web, the Exa service is used, known for its speed and cost-effectiveness [03:30:00].
searchWeb
function:- Takes a
query
(string) [03:49:00]. - Uses
exa.searchAndContents
[03:53:00]. - Configurable options:
resultsLimit
(defaulting to 1 for simplicity) andliveCrawl
(ensures up-to-date results, potentially impacting performance) [03:59:00]. - Crucially, results are mapped to return only relevant information (e.g.,
url
,title
,text
) to reduce token usage and improve model effectiveness by trimming irrelevant data [03:49:00]. This is a common strategy in Generative AI project challenges and strategies related to prompt engineering and cost optimization.
- Takes a
3. Analyzing Results for Learnings and Follow-up Questions
This is an agentic part of the workflow, where the model decides how to proceed based on the relevance of search results [03:14:00].
-
searchAndProcess
function:- Uses
generateText
withmaxSteps
(e.g., 5) to create an autonomous loop [03:27:00]. - Tools Defined:
searchWeb
: Searches the web for a query. The result is added topendingSearchResults
[03:57:00].evaluate
: Evaluates the latestpendingSearchResult
[03:41:00].- Uses
generateObject
in enum mode (relevant
orirrelevant
) to determine relevance [04:15:00]. - If irrelevant, the tool returns a string like “Search results are irrelevant, please search again with a more specific query,” which guides the language model to refine its next search [04:01:00].
- If relevant, the result is moved to
finalSearchResults
[04:42:00]. - Crucially, this tool also checks
accumulatedSources
to avoid reusing previously processed URLs, preventing redundant searches and saving tokens [05:22:00]. This addresses a common design challenge in building web research agents.
- Uses
- The
maxSteps
parameter allows the model to autonomously continue searching and evaluating until a relevant result is found or the step limit is reached [03:27:00].
- Uses
-
generateLearnings
function:- Takes the original
query
and thesearchResult
(scraped web page content) [04:39:00]. - Uses
generateObject
to extract alearning
(insight) andfollowUpQuestions
(an array of strings) from the content [04:41:00]. - The prompt emphasizes the user’s research goal and the relevant search result [04:43:00].
- Takes the original
4. Introducing Recursion for Deeper Research
To enable handling complex queries with deep research and go deeper into specific topics, a recursive deepResearch
function is implemented.
deepResearch
function:- Manages the entire research process recursively, tracking accumulated research state (original query, active queries, search results, learnings, completed queries) [04:47:00].
- Accepts
prompt
,depth
, andbreadth
parameters to control the scope [04:49:00]. - Generates search queries, calls
searchAndProcess
for each, and thengenerateLearnings
[04:49:00]. - Updates the global
accumulatedResearch
store with new findings [04:46:00]. - Recursively calls itself with new queries derived from
followUpQuestions
, decrementingdepth
andbreadth
to ensure termination [04:56:00]. - A base case handles
depth
reaching zero, at which point the recursion stops [05:26:00].
5. Generating the Final Report
Once all research is accumulated, a final model synthesizes the information into a coherent report.
generateReport
function:- Takes the
accumulatedResearch
object [05:47:00]. - Uses
generateText
with a reasoning model (e.g.,GPT-3.5 mini
was found effective) [05:57:00]. - A detailed system prompt is used to guide the model on formatting (e.g., Markdown), persona (expert researcher), and specific instructions (e.g., using today’s date, allowing speculation but flagging it) [05:22:00]. This ensures a structured and high-quality output report.
- The final report is then written to a markdown file [05:49:00].
- Takes the
Key Takeaways
- This project demonstrates how to break down complex problems like deep research into a structured, multi-step AI workflow [02:50:00].
- The AI SDK’s
generateObject
andgenerateText
functions, combined with tool calling and recursion, allow for the creation of sophisticated, autonomous agents [02:50:00]. - Effective prompt engineering, including system prompts and the use of ZOD for structured outputs, is crucial for guiding language models and ensuring desired results [02:28:00].
- Optimizing token usage by filtering irrelevant information from tool results is essential for cost-efficiency and model performance [03:04:00].
- The project provides a practical example of deep research features of Gemini at Google and OpenAI’s research capabilities.
Tools and Technologies
- AI SDK: Core library for interacting with Large Language Models (LLMs) and building agents [00:18:00].
generateText
: Generates text from an LLM, supports tools andmaxSteps
for agentic behavior [01:06:00].generateObject
: Dedicated function for generating structured JSON objects based on a defined schema, preferred for its type safety and control [01:55:00].streamText
,streamObject
: Streaming versions of the generation functions [02:09:00].- Unified Interface: Allows switching between different LLM providers (OpenAI, Perplexity, Google Gemini) by changing a single line of code [02:41:00].
- ZOD: A TypeScript-first schema declaration and validation library, used for defining structured output schemas [01:47:00]. Its
describe
function can add specific instructions to the model for individual schema fields [02:14:00]. - Exa: A search service used for web crawling and searching, providing live and cached content [03:00:00].
- Node.js: The runtime environment for the project [02:49:00].