From: aidotengineer

This session explores building agents using the Vercel AI SDK, covering fundamental building blocks and a practical project to create a deep research clone [00:00:07] [00:00:25]. The examples are demonstrated in Node.js [00:00:28].

To follow along, clone the repository, install dependencies, and set up environment variables [00:00:34]. The project typically involves a single index.ts file, runnable via pnpm rundev [00:00:46].

Fundamental Building Blocks

generateText Function

The generateText function is a core primitive for calling a large language model and generating text [00:01:06].

Basic Usage

import { generateText, openai } from 'ai';
 
async function main() {
  const result = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: 'Hello world.',
  });
  console.log(result.text);
}
main();

This example prompts GPT-4o Mini to respond to “Hello world,” logging “Hello, how can I assist you today?” [00:01:33] [00:02:02].

The generateText function (along with streamText, generateObject, and streamObject) can accept either a prompt string or an array of messages as input, where each message has a role and content [00:02:07].

Unified Interface and Model Switching

A core feature of the AI SDK is its unified interface, allowing developers to switch between different language models by changing a single line of code [00:02:37] [00:02:41]. This is useful for optimizing for cost, speed, or specific use cases [00:02:50].

Models like GPT-4o Mini might struggle with recent information as their training data has a cutoff [00:03:08]. To address this, one can either add a tool for web access or select a model with built-in web search capabilities [00:03:37].

Switching to Perplexity with Web Search

To switch to Perplexity’s sonar-pro model, which includes web search, simply change the model import and invocation [00:03:59]:

import { generateText, perplexity } from 'ai'; // Changed import
 
async function main() {
  const result = await generateText({
    model: perplexity('sonar-pro'), // Changed model invocation
    prompt: 'When was the AI engineer summit in 2025?',
  });
  console.log(result.text);
}
main();

Perplexity’s response will correctly state the summit dates, often including sources via the sources property [00:04:28] [00:04:51]. This flexibility extends to many providers, including Google’s Gemini with search grounding [00:05:10] [00:05:33].

Tools (Function Calling)

Tools allow language models to interact with the outside world and perform actions [00:06:15]. The core idea is to provide the model with a prompt and a list of available tools, each with a name, description, and required parameters [00:06:26]. Instead of generating text, the model might generate a tool call, which the developer then parses and executes [00:06:53].

Adding Two Numbers with a Tool

import { generateText, openai, tool } from 'ai';
import { z } from 'zod';
 
async function main() {
  const result = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: "What's 10 + 5?",
    tools: {
      addNumbers: tool({
        description: 'Adds two numbers together.',
        parameters: z.object({
          num1: z.number(),
          num2: z.number(),
        }),
        execute: async ({ num1, num2 }) => num1 + num2,
      }),
    },
  });
  console.log(result.toolResults); // Logs the tool execution result
}
main();

The tool utility function provides type safety between defined parameters and the execute function arguments [00:08:06]. The AI SDK automatically parses tool calls, invokes the execute function, and returns the result in a toolResults array [00:09:23].

maxSteps Property for Multi-step Agents

When a tool call is generated, the language model does not immediately provide a text response [00:10:05]. The maxSteps property enables the model to continue autonomously [00:11:35]. If a tool call occurs, the toolResult is sent back to the model with the previous conversation context, triggering another generation [00:11:50]. This loop continues until plain text is generated or the maximum step threshold is reached [00:12:05].

Multi-step Agent with maxSteps

import { generateText, openai, tool } from 'ai';
import { z } from 'zod';
 
async function main() {
  const { text, toolResults, steps } = await generateText({
    model: openai('gpt-4o-mini'),
    maxSteps: 3, // Set max steps to allow follow-up generations
    prompt: 'Get the weather in San Francisco and New York and then add them together.',
    tools: {
      addNumbers: tool({ /* ... */ }),
      getWeather: tool({
        description: 'Get the current weather at a location.',
        parameters: z.object({
          latitude: z.number(),
          longitude: z.number(),
          city: z.string(),
        }),
        execute: async ({ latitude, longitude, city }) => {
          // Simulate weather API call
          if (city === 'San Francisco') return { temperature: 12.3, unit: 'C' };
          if (city === 'New York') return { temperature: 15.2, unit: 'C' };
          return { temperature: 0, unit: 'C' };
        },
      }),
    },
  });
  console.log('Resulting text:', text);
  console.log('Steps:', JSON.stringify(steps, null, 2));
}
main();

In this example, the model might first call getWeather for both cities (potentially in parallel tool calls), then addNumbers with the extracted temperatures, and finally generate a text summary [00:16:17] [00:17:09]. The maxSteps allows this sequence of actions and textual synthesis [00:12:05]. Developers can tap into the raw request and response bodies for debugging [00:13:24].

Structured Data and generateObject

The AI SDK provides two ways to generate structured outputs:

  1. Using generateText with its experimental_output option [00:18:46].
  2. Using the dedicated generateObject function, often considered a powerful “workhorse” [00:18:55].

Using generateText with experimental_output: This option allows defining a ZOD schema for the desired output structure [00:19:30]. ZOD is a TypeScript-first schema declaration and validation library, ideal for structured outputs [00:19:47].

Structured Output with generateText

import { generateText, openai, tool } from 'ai';
import { z } from 'zod';
 
async function main() {
  const { experimental_output } = await generateText({
    model: openai('gpt-4o-mini'),
    maxSteps: 3,
    prompt: 'Get the weather in San Francisco and New York and then add them together.',
    tools: { /* ... getWeather and addNumbers tools ... */ },
    experimental_output: z.object({
      sum: z.number(), // Define the desired output schema
    }),
  });
  console.log(experimental_output?.sum); // Access type-safe output
}
main();

This results in a simple, type-safe object rather than verbose text [00:20:38].

Using generateObject Function: generateObject is specifically designed for structured outputs [00:18:57].

generateObject for AI Agent Definitions

import { generateObject, openai } from 'ai';
import { z } from 'zod';
 
async function main() {
  const { object } = await generateObject({
    model: openai('gpt-4o-mini'),
    prompt: 'Please come up with 10 definitions for AI agents.',
    schema: z.object({
      definitions: z.array(z.string().describe('Each definition should use as much jargon as possible and be completely incoherent.')),
    }),
  });
  console.log(object.definitions);
}
main();

This example generates an array of 10 AI agent definitions [00:22:04]. The .describe() function in ZOD allows adding specific instructions for each part of the schema, guiding the model’s generation without cluttering the main prompt [00:23:12].

Practical Project: Deep Research Clone

The practical project involves building a deep research clone in Node.js [00:24:39]. This demonstrates how to break down complex tasks into a structured workflow, combine different AI SDK functions, and create autonomous agentic elements [00:25:05].

The concept of deep research involves giving a model a topic, letting it search the web, aggregate resources, go down “webs of thought,” and finally produce a report [00:26:12].

Workflow Breakdown

The typical workflow for a deep research agent is:

  1. Input Query: Start with a rough query/prompt [00:26:52].
  2. Generate Subqueries: For the input prompt, generate multiple search queries (e.g., “What is an electric car?”, “Biggest electric car producers?“) [00:26:57].
  3. Search Web: For each subquery, search the web for relevant results [00:27:15].
  4. Analyze Results: Analyze the search results for key learnings and follow-up questions [00:27:19].
  5. Recursive Inquiry: If more depth is desired, take the follow-up questions and existing research, generate new queries, and repeat the process recursively, accumulating information [00:27:26]. This allows for exploring “webs of thought” [00:27:47].

The depth and breadth settings control the level and scope of information gathered [00:29:06].

generateSearchQueries Function

This function takes the main query and the desired number of subqueries to generate [00:29:48]. It uses generateObject with a specific schema to ensure the output is an array of strings suitable for search engines [00:30:08].

generateSearchQueries

// Uses generateObject to produce a list of search queries
async function generateSearchQueries(query: string, numQueries: number) { /* ... */ }
 
async function main() {
  const prompt = 'What do you need to be a D1 shot put athlete?';
  const queries = await generateSearchQueries(prompt, 3);
  console.log(queries);
}
main();

For the prompt “What do you need to be a D1 shot put athlete?”, it might generate queries like “requirements to become a D1 shotput athlete,” “training regiment for D1 shotput athlete,” etc. [00:32:20].

searchWeb Function

This function uses the Exa API for web search [00:32:59]. It takes a query and returns relevant results. Important configurations include numResults and liveCrawl (to ensure live, non-cached results) [00:33:59].

Token Efficiency

It’s crucial to map through search results and only return the information relevant to the language model (e.g., URL, title, content) [00:34:49]. This reduces token count, making calls cheaper and improving model effectiveness by removing irrelevant information [00:35:02].

searchAndProcess Function (Agentic Evaluation)

This is the most complex and “agentic” part of the workflow [00:36:10]. It uses generateText with two tools: searchWeb and evaluate [00:37:18]. The model continually searches the web and then evaluates the relevance of the results [00:37:37].

  • searchWeb tool: Calls the searchWeb function and adds the results to a pendingSearchResults array [00:38:57].
  • evaluate tool: Pulls the latest pending result, uses generateObject (in enum mode for “relevant” or “irrelevant” output) to determine its usefulness for the query [00:39:41]. If relevant, it’s added to finalSearchResults; otherwise, it’s discarded [00:40:41]. If irrelevant, the tool returns a message prompting the model to “search again with a more specific query,” leveraging maxSteps to continue the loop with feedback [00:40:51].

Optimizing Tool Execution

Instead of making the language model parse potentially large search results as tool parameters (which can be costly and error-prone), use local variables or a shared state within the tool’s execute function [00:41:40]. This keeps token usage down and improves accuracy [00:42:09].

generateLearnings Function

This function takes the original query and the relevant search results to generate a learning (insight) and follow-up questions [00:43:39]. It uses generateObject with a schema expecting a learning string and an array of followUpQuestions strings [00:43:54].

generateLearnings Output

For a query about D1 shotput, a learning might be: “To become a D1 shop at athlete, high school athletes typically need to have four years of varsity experience, achieve highstate finishes or be state champions, and participate in national events like USATF National Junior Olympic Outdoor Track and Field Championships.” [00:45:34] Follow-up questions could be: “What are the specific training regimens for shot put athletes?” or “How do division one recruiting standards differ from division two and three?” [00:45:58]

Recursion and State Management

To achieve “deep research,” the process must be recursive [00:46:13]. This requires a dedicated recursive function (e.g., deepResearch) and a global or shared state variable (e.g., accumulatedResearch) to track all gathered information across recursive calls [00:46:36].

The deepResearch function takes parameters like depth (how many levels deep to go) and breadth (how many new queries to generate at each step) [00:49:09]. It updates the accumulatedResearch object with new search queries, search results, learnings, and completed queries [00:49:46]. The recursion decrements depth and breadth to ensure termination [00:50:01].

To prevent redundant searches, the searchAndProcess function is updated to pass in previously used sources (URLs) to the evaluate tool [00:52:26]. If a search result’s URL already exists, it’s marked as irrelevant, prompting the agent to find new information [00:52:57].

generateReport Function

Finally, all the accumulatedResearch is passed to a reasoning model (e.g., GPT-4o Mini) using generateText to synthesize the information into a comprehensive report [00:54:21].

Improving Report Quality

To get the best response from the model, provide clear guidance on the desired output format and structure. A systemPrompt can define the model’s persona (e.g., “You are an expert researcher”), today’s date, formatting requirements (e.g., “Use markdown formatting”), and specific guidelines (e.g., allowing speculation but requiring flags) [00:56:51].

The resulting markdown report will be structured and detailed, demonstrating the agent’s ability to conduct in-depth research and synthesize findings into a coherent document [00:58:12]. The entire deep research agent, including all components, can be built in relatively few lines of code (e.g., 218 lines) using the AI SDK [00:59:08].