Practical project Creating a deep research clone

From: aidotengineer

This section focuses on building a deep research clone using the AI SDK in Node.js, demonstrating how to construct complex AI systems by combining various AI SDK functions and integrating with external tools [00:25:24]. The project aims to take a user query, conduct deep research by searching the web, aggregate findings, and compile them into a markdown report [00:24:52].

Project Workflow Overview

Deep research products, like those offered by OpenAI or Google’s Gemini, typically take a topic, search the web, aggregate resources, and return a comprehensive report [00:26:12]. This project’s workflow is broken down into several autonomous agentic steps:

Input Query: Start with a user-provided prompt or rough query [00:26:53].
Generate Subqueries: Based on the initial prompt, generate a list of specific search queries [00:26:58].
Search the Web: For each subquery, search the web for relevant results [00:27:15].
Analyze Results: Analyze the search results to extract key learnings and identify follow-up questions [00:27:19].
Recursive Research: If necessary, take the follow-up questions and existing research to generate new queries, recursively repeating the process to explore topics in depth [00:27:26]. This allows the system to go down “webs of thought” and accumulate a comprehensive set of information [00:27:47].
- Depth and Breadth: The depth setting controls how many levels deep the research goes, while breadth dictates how many different lines of inquiry are pursued at each step [00:28:06].
Generate Report: Synthesize all the accumulated research into a final markdown report [00:26:26].

Implementation Details

The project utilizes the AI SDK’s capabilities for prototyping and production in AI, especially its generateObject and generateText functions, along with ZOD for schema definition [01:52:05].

Project Setup

To follow along with the implementation:

Clone the repository [00:35:00].
Install dependencies [00:38:00].
Copy environment variables [00:40:00].
Run the index.ts file using pmpp rundev (or pd as an alias) [00:48:00].

1. Generating Search Queries

The initial step involves taking a broad user query and generating more specific search queries for a search engine [02:50:00].

generateSearchQueries function:
- Takes a query (string) and numberOfSearchQueries (defaulting to 3) [02:56:00].
- Uses generateObject with a mainModel (e.g., GPT-4o mini) [03:08:00].
- The prompt instructs the model to generate n search queries for the given input query [03:30:00].
- The output schema is an array of strings (Z.array(Z.string())) with a minimum of 1 and maximum of 5 items, though the default is 3 [03:38:00].
- Example for “what do you need to be a D1 shotput athlete?”: “requirements to become a D1 shotput athlete”, “training regiment for D1 shotput athlete”, “qualifications for NCAA division one shot put” [03:27:00].

2. Web Search with Exa

For searching the web, the Exa service is used, known for its speed and cost-effectiveness [03:30:00].

searchWeb function:
- Takes a query (string) [03:49:00].
- Uses exa.searchAndContents [03:53:00].
- Configurable options: resultsLimit (defaulting to 1 for simplicity) and liveCrawl (ensures up-to-date results, potentially impacting performance) [03:59:00].
- Crucially, results are mapped to return only relevant information (e.g., url, title, text) to reduce token usage and improve model effectiveness by trimming irrelevant data [03:49:00]. This is a common strategy in Generative AI project challenges and strategies related to prompt engineering and cost optimization.

3. Analyzing Results for Learnings and Follow-up Questions

This is an agentic part of the workflow, where the model decides how to proceed based on the relevance of search results [03:14:00].

searchAndProcess function:
- Uses generateText with maxSteps (e.g., 5) to create an autonomous loop [03:27:00].
- Tools Defined:
  - searchWeb: Searches the web for a query. The result is added to pendingSearchResults [03:57:00].
  - evaluate: Evaluates the latest pendingSearchResult [03:41:00].
    - Uses generateObject in enum mode (relevant or irrelevant) to determine relevance [04:15:00].
    - If irrelevant, the tool returns a string like “Search results are irrelevant, please search again with a more specific query,” which guides the language model to refine its next search [04:01:00].
    - If relevant, the result is moved to finalSearchResults [04:42:00].
    - Crucially, this tool also checks accumulatedSources to avoid reusing previously processed URLs, preventing redundant searches and saving tokens [05:22:00]. This addresses a common design challenge in building web research agents.
- The maxSteps parameter allows the model to autonomously continue searching and evaluating until a relevant result is found or the step limit is reached [03:27:00].
generateLearnings function:
- Takes the original query and the searchResult (scraped web page content) [04:39:00].
- Uses generateObject to extract a learning (insight) and followUpQuestions (an array of strings) from the content [04:41:00].
- The prompt emphasizes the user’s research goal and the relevant search result [04:43:00].

4. Introducing Recursion for Deeper Research

To enable handling complex queries with deep research and go deeper into specific topics, a recursive deepResearch function is implemented.

deepResearch function:
- Manages the entire research process recursively, tracking accumulated research state (original query, active queries, search results, learnings, completed queries) [04:47:00].
- Accepts prompt, depth, and breadth parameters to control the scope [04:49:00].
- Generates search queries, calls searchAndProcess for each, and then generateLearnings [04:49:00].
- Updates the global accumulatedResearch store with new findings [04:46:00].
- Recursively calls itself with new queries derived from followUpQuestions, decrementing depth and breadth to ensure termination [04:56:00].
- A base case handles depth reaching zero, at which point the recursion stops [05:26:00].

5. Generating the Final Report

Once all research is accumulated, a final model synthesizes the information into a coherent report.

generateReport function:
- Takes the accumulatedResearch object [05:47:00].
- Uses generateText with a reasoning model (e.g., GPT-3.5 mini was found effective) [05:57:00].
- A detailed system prompt is used to guide the model on formatting (e.g., Markdown), persona (expert researcher), and specific instructions (e.g., using today’s date, allowing speculation but flagging it) [05:22:00]. This ensures a structured and high-quality output report.
- The final report is then written to a markdown file [05:49:00].

Key Takeaways

This project demonstrates how to break down complex problems like deep research into a structured, multi-step AI workflow [02:50:00].
The AI SDK’s generateObject and generateText functions, combined with tool calling and recursion, allow for the creation of sophisticated, autonomous agents [02:50:00].
Effective prompt engineering, including system prompts and the use of ZOD for structured outputs, is crucial for guiding language models and ensuring desired results [02:28:00].
Optimizing token usage by filtering irrelevant information from tool results is essential for cost-efficiency and model performance [03:04:00].
The project provides a practical example of deep research features of Gemini at Google and OpenAI’s research capabilities.

Tools and Technologies

AI SDK: Core library for interacting with Large Language Models (LLMs) and building agents [00:18:00].
- generateText: Generates text from an LLM, supports tools and maxSteps for agentic behavior [01:06:00].
- generateObject: Dedicated function for generating structured JSON objects based on a defined schema, preferred for its type safety and control [01:55:00].
- streamText, streamObject: Streaming versions of the generation functions [02:09:00].
- Unified Interface: Allows switching between different LLM providers (OpenAI, Perplexity, Google Gemini) by changing a single line of code [02:41:00].
ZOD: A TypeScript-first schema declaration and validation library, used for defining structured output schemas [01:47:00]. Its describe function can add specific instructions to the model for individual schema fields [02:14:00].
Exa: A search service used for web crawling and searching, providing live and cached content [03:00:00].
Node.js: The runtime environment for the project [02:49:00].

Tubegraph

Explorer

Table of Contents