From: aidotengineer

This section focuses on building a deep research clone using the AI SDK in Node.js, demonstrating how to construct complex AI systems by combining various AI SDK functions and integrating with external tools [00:25:24]. The project aims to take a user query, conduct deep research by searching the web, aggregate findings, and compile them into a markdown report [00:24:52].

Project Workflow Overview

Deep research products, like those offered by OpenAI or Google’s Gemini, typically take a topic, search the web, aggregate resources, and return a comprehensive report [00:26:12]. This project’s workflow is broken down into several autonomous agentic steps:

  1. Input Query: Start with a user-provided prompt or rough query [00:26:53].
  2. Generate Subqueries: Based on the initial prompt, generate a list of specific search queries [00:26:58].
  3. Search the Web: For each subquery, search the web for relevant results [00:27:15].
  4. Analyze Results: Analyze the search results to extract key learnings and identify follow-up questions [00:27:19].
  5. Recursive Research: If necessary, take the follow-up questions and existing research to generate new queries, recursively repeating the process to explore topics in depth [00:27:26]. This allows the system to go down “webs of thought” and accumulate a comprehensive set of information [00:27:47].
    • Depth and Breadth: The depth setting controls how many levels deep the research goes, while breadth dictates how many different lines of inquiry are pursued at each step [00:28:06].
  6. Generate Report: Synthesize all the accumulated research into a final markdown report [00:26:26].

Implementation Details

The project utilizes the AI SDK’s capabilities for prototyping and production in AI, especially its generateObject and generateText functions, along with ZOD for schema definition [01:52:05].

Project Setup

To follow along with the implementation:

  1. Clone the repository [00:35:00].
  2. Install dependencies [00:38:00].
  3. Copy environment variables [00:40:00].
  4. Run the index.ts file using pmpp rundev (or pd as an alias) [00:48:00].

1. Generating Search Queries

The initial step involves taking a broad user query and generating more specific search queries for a search engine [02:50:00].

  • generateSearchQueries function:
    • Takes a query (string) and numberOfSearchQueries (defaulting to 3) [02:56:00].
    • Uses generateObject with a mainModel (e.g., GPT-4o mini) [03:08:00].
    • The prompt instructs the model to generate n search queries for the given input query [03:30:00].
    • The output schema is an array of strings (Z.array(Z.string())) with a minimum of 1 and maximum of 5 items, though the default is 3 [03:38:00].
    • Example for “what do you need to be a D1 shotput athlete?”: “requirements to become a D1 shotput athlete”, “training regiment for D1 shotput athlete”, “qualifications for NCAA division one shot put” [03:27:00].

2. Web Search with Exa

For searching the web, the Exa service is used, known for its speed and cost-effectiveness [03:30:00].

  • searchWeb function:
    • Takes a query (string) [03:49:00].
    • Uses exa.searchAndContents [03:53:00].
    • Configurable options: resultsLimit (defaulting to 1 for simplicity) and liveCrawl (ensures up-to-date results, potentially impacting performance) [03:59:00].
    • Crucially, results are mapped to return only relevant information (e.g., url, title, text) to reduce token usage and improve model effectiveness by trimming irrelevant data [03:49:00]. This is a common strategy in Generative AI project challenges and strategies related to prompt engineering and cost optimization.

3. Analyzing Results for Learnings and Follow-up Questions

This is an agentic part of the workflow, where the model decides how to proceed based on the relevance of search results [03:14:00].

  • searchAndProcess function:

    • Uses generateText with maxSteps (e.g., 5) to create an autonomous loop [03:27:00].
    • Tools Defined:
      • searchWeb: Searches the web for a query. The result is added to pendingSearchResults [03:57:00].
      • evaluate: Evaluates the latest pendingSearchResult [03:41:00].
        • Uses generateObject in enum mode (relevant or irrelevant) to determine relevance [04:15:00].
        • If irrelevant, the tool returns a string like “Search results are irrelevant, please search again with a more specific query,” which guides the language model to refine its next search [04:01:00].
        • If relevant, the result is moved to finalSearchResults [04:42:00].
        • Crucially, this tool also checks accumulatedSources to avoid reusing previously processed URLs, preventing redundant searches and saving tokens [05:22:00]. This addresses a common design challenge in building web research agents.
    • The maxSteps parameter allows the model to autonomously continue searching and evaluating until a relevant result is found or the step limit is reached [03:27:00].
  • generateLearnings function:

    • Takes the original query and the searchResult (scraped web page content) [04:39:00].
    • Uses generateObject to extract a learning (insight) and followUpQuestions (an array of strings) from the content [04:41:00].
    • The prompt emphasizes the user’s research goal and the relevant search result [04:43:00].

4. Introducing Recursion for Deeper Research

To enable handling complex queries with deep research and go deeper into specific topics, a recursive deepResearch function is implemented.

  • deepResearch function:
    • Manages the entire research process recursively, tracking accumulated research state (original query, active queries, search results, learnings, completed queries) [04:47:00].
    • Accepts prompt, depth, and breadth parameters to control the scope [04:49:00].
    • Generates search queries, calls searchAndProcess for each, and then generateLearnings [04:49:00].
    • Updates the global accumulatedResearch store with new findings [04:46:00].
    • Recursively calls itself with new queries derived from followUpQuestions, decrementing depth and breadth to ensure termination [04:56:00].
    • A base case handles depth reaching zero, at which point the recursion stops [05:26:00].

5. Generating the Final Report

Once all research is accumulated, a final model synthesizes the information into a coherent report.

  • generateReport function:
    • Takes the accumulatedResearch object [05:47:00].
    • Uses generateText with a reasoning model (e.g., GPT-3.5 mini was found effective) [05:57:00].
    • A detailed system prompt is used to guide the model on formatting (e.g., Markdown), persona (expert researcher), and specific instructions (e.g., using today’s date, allowing speculation but flagging it) [05:22:00]. This ensures a structured and high-quality output report.
    • The final report is then written to a markdown file [05:49:00].

Key Takeaways

  • This project demonstrates how to break down complex problems like deep research into a structured, multi-step AI workflow [02:50:00].
  • The AI SDK’s generateObject and generateText functions, combined with tool calling and recursion, allow for the creation of sophisticated, autonomous agents [02:50:00].
  • Effective prompt engineering, including system prompts and the use of ZOD for structured outputs, is crucial for guiding language models and ensuring desired results [02:28:00].
  • Optimizing token usage by filtering irrelevant information from tool results is essential for cost-efficiency and model performance [03:04:00].
  • The project provides a practical example of deep research features of Gemini at Google and OpenAI’s research capabilities.

Tools and Technologies

  • AI SDK: Core library for interacting with Large Language Models (LLMs) and building agents [00:18:00].
    • generateText: Generates text from an LLM, supports tools and maxSteps for agentic behavior [01:06:00].
    • generateObject: Dedicated function for generating structured JSON objects based on a defined schema, preferred for its type safety and control [01:55:00].
    • streamText, streamObject: Streaming versions of the generation functions [02:09:00].
    • Unified Interface: Allows switching between different LLM providers (OpenAI, Perplexity, Google Gemini) by changing a single line of code [02:41:00].
  • ZOD: A TypeScript-first schema declaration and validation library, used for defining structured output schemas [01:47:00]. Its describe function can add specific instructions to the model for individual schema fields [02:14:00].
  • Exa: A search service used for web crawling and searching, providing live and cached content [03:00:00].
  • Node.js: The runtime environment for the project [02:49:00].