Building agents with AI SDK

From: aidotengineer

This article explores building agents using the AI SDK, covering fundamental building blocks and a practical project to create a deep research clone [00:00:07]. The session begins with core concepts of the AI SDK before diving into an agent-building example in Node.js [00:00:11].

To follow along, clone the repository, install dependencies, and copy environment variables [00:00:34]. The project uses a single index.ts file, runnable with pnpm rundev [00:00:46].

Fundamentals of the AI SDK

Generating Text

The generateText function allows interaction with large language models to produce text [00:01:06]. It takes a specified model (e.g., OpenAI’s GPT-4o mini) and a prompt, then logs the resulting text [00:01:39].

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
 
async function main() {
  const result = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: 'hello world',
  });
  console.log(result.text);
}
 
main();

[00:01:33]

generateText can also accept an array of messages instead of a prompt, where each message has a role and content [00:02:16].

A core feature of the AI SDK is its unified interface, enabling easy switching between language models by changing a single line of code [00:02:37]. This is useful for optimizing for cost, speed, or specific use cases [00:02:50]. For models that lack real-time web access, like GPT-4o Mini, the SDK allows seamless integration with models that have built-in web search capabilities, such as Perplexity [00:03:51].

import { generateText } from 'ai';
import { perplexity } from '@ai-sdk/perplexity';
 
async function main() {
  const result = await generateText({
    model: perplexity('sonar-pro'),
    prompt: 'when was the AI engineer summit in 2025?',
  });
  console.log(result.text);
  console.log(result.sources); // Accessing sources property
}
 
main();

[00:04:04]

The sources property provides access to references used by models like Perplexity [00:04:51]. The AI SDK supports numerous providers, many offering web search, which can be explored in their documentation [00:05:10]. For example, Google’s Gemini Flash 1.5 can be used with searchGrounding enabled [00:05:33].

Using Tools and Function Calling

Tools, or function calling, enable language models to interact with the outside world and perform actions [00:06:15]. The model is given a prompt and a list of available tools, each with a name, description, and required data (parameters) [00:06:26].

When the model decides to use a tool, it generates a “tool call” (the tool’s name and arguments parsed from the conversation context) instead of text [00:06:53]. The developer then parses and runs this tool call [00:07:13].

The AI SDK simplifies this process:

Tools are passed to generateText or streamText functions via a tools object [00:07:48].
The tool utility function provides type safety between defined parameters and arguments in the execute function [00:08:06].
The execute function can contain any arbitrary asynchronous JavaScript code [00:08:33].
The SDK automatically parses tool calls, invokes the execute function, and returns the result in a toolResults array [00:09:24].

import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
 
async function main() {
  const result = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: 'What\'s 10 + 5?',
    tools: {
      addNumbers: tool({
        description: 'Adds two numbers together',
        parameters: {
          type: 'object',
          properties: {
            num1: { type: 'number' },
            num2: { type: 'number' },
          },
          required: ['num1', 'num2'],
        },
        execute: async ({ num1, num2 }) => num1 + num2,
      }),
    },
  });
  console.log(result.toolResults);
}
 
main();

[00:07:37]

When a model generates a tool call, it typically doesn’t return text directly [00:10:13]. To get the model to incorporate tool results into a generated text answer, the maxSteps property is used [00:11:35]. If maxSteps is set, the SDK automatically sends the tool result and previous conversation context back to the model, triggering another generation. This continues until the model generates plain text or the maxSteps threshold is reached [00:11:43]. This mechanism allows the model to run autonomously, picking the next step without explicit developer logic [00:12:16]. This forms the basis of multi-step agents [00:13:50].

Example demonstrating maxSteps with multiple tools (adding numbers and getting weather):

// ... (imports and addNumbers tool as above)
import { perplexity } from '@ai-sdk/perplexity'; // Example for weather tool
 
async function main() {
  const result = await generateText({
    model: openai('gpt-4o-mini'), // Or perplexity('sonar-pro') for web access
    prompt: 'Get the weather in San Francisco and New York and then add them together.',
    maxSteps: 3, // Allow multiple steps for tools and final text generation
    tools: {
      // ... addNumbers tool
      getWeather: tool({
        description: 'Get the current weather at a location',
        parameters: {
          type: 'object',
          properties: {
            latitude: { type: 'number' },
            longitude: { type: 'number' },
            city: { type: 'string' },
          },
          required: ['city'], // Model infers lat/long from city
        },
        execute: async ({ latitude, longitude, city }) => {
          // Placeholder for actual weather API call
          if (city === 'San Francisco') return { temperature: 12.3 };
          if (city === 'New York') return { temperature: 15.2 };
          return { temperature: null };
        },
      }),
    },
  });
  console.log(result.text);
  console.log(JSON.stringify(result.steps, null, 2)); // Show steps
}
main();

[00:14:15]

The result.steps array provides a detailed log of each action the agent took, including tool calls and their results [00:12:49]. The AI SDK also allows tapping into the raw request and response bodies for debugging [00:13:24].

Generating Structured Data

Generating structured data (structured outputs) is possible in two ways with the AI SDK:

Using generateText with its experimentalOutput option [00:18:43].
Using the dedicated generateObject function [00:18:55].

generateObject is particularly powerful for creating type-safe structured outputs [00:19:01]. It leverages ZOD, a TypeScript validation library, to define schemas [00:19:47].

Example using generateText with experimentalOutput:

import { generateText, experimental_output } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
 
async function main() {
  const result = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: 'What\'s 10 + 5?',
    maxSteps: 3,
    tools: { /* ... addNumbers tool ... */ },
    experimental_output: experimental_output(
      z.object({
        sum: z.number(),
      }),
    ),
  });
  console.log(result.experimental_output);
}
main();

[00:19:27]

Example using generateObject for defining AI agent:

import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
 
async function main() {
  const result = await generateObject({
    model: openai('gpt-4o-mini'),
    prompt: 'Please come up with 10 definitions for AI agents.',
    schema: z.object({
      definitions: z.array(z.string()),
    }),
  });
  console.log(result.object);
}
main();

[00:21:03]

ZOD’s .describe() function can be chained to schema definitions to provide more specific instructions to the language model for each value, helping to get more precise outputs [00:23:14].

// ... (inside main function)
schema: z.object({
  definitions: z.array(z.string().describe('Each definition should use as much jargon as possible and be completely incoherent.')),
}),
// ...

[00:23:37]

Building a Deep Research Clone

This section focuses on building a multi-step agent that mimics a deep research tool [00:24:39]. The goal is to take a query, conduct deep research by searching the web, and then write a Markdown report to the file system [00:24:52]. This project demonstrates structuring a workflow, incorporating autonomous agentic elements, and combining different AI SDK functions [00:25:05].

Deep Research Workflow

The general steps for the deep research clone are:

Input Query: Take a rough query or prompt [00:26:53].
Generate Subqueries: For the input prompt, generate a set of more specific search queries [00:26:58].
Search the Web: For each subquery, search the web for relevant results [00:27:15].
Analyze Results: Analyze each search result for key learnings and follow-up questions [00:27:19].
Recursion: If further depth is desired, use the follow-up questions and existing research to generate new queries and repeat the process recursively, accumulating information [00:27:26].
Report Generation: Synthesize all gathered information into a comprehensive report [00:26:24].

The process can be controlled by depth (levels of recursion) and breadth (number of queries at each step) [00:29:06].

Step 1: Generate Search Queries

The generateSearchQueries function uses generateObject to create an array of search queries based on the initial prompt [00:29:50].

import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
 
const mainModel = openai('gpt-4o-mini'); // Centralized model definition
 
async function generateSearchQueries(query: string, numberOfQueries: number = 3): Promise<string[]> {
  const { object } = await generateObject({
    model: mainModel,
    prompt: `Generate ${numberOfQueries} search queries for the following query: ${query}`,
    schema: z.object({
      queries: z.array(z.string().min(1).max(5)), // Loose constraints here, but example
    }),
  });
  return object.queries;
}
 
// ... (main function to test)

[00:29:43]

Step 2: Search the Web with Exa

Exa is used for web search, offering speed and efficiency [00:32:59]. The searchWeb function takes a query and uses Exa’s search and contents functions [00:33:47]. Key configurations include numResults and liveCrawl (to ensure live, up-to-date results) [00:33:59].

Crucially, the function maps through results to return only relevant information (e.g., url, title, text) to reduce token usage and improve model effectiveness [00:34:49].

import { Exa } from 'exa';
 
const exa = new Exa(process.env.EXA_API_KEY);
 
type SearchResult = {
  url: string;
  title: string;
  text: string;
};
 
async function searchWeb(query: string): Promise<SearchResult[]> {
  const { results } = await exa.searchAndContents(query, {
    numResults: 1, // Limited for simple demo
    liveCrawl: true,
  });
  return results.map(result => ({
    url: result.url,
    title: result.title,
    text: result.text,
  }));
}

[00:33:22]

Step 3: Search and Process (Agentic Part)

The searchAndProcess function is the core agentic component, responsible for finding and validating relevant search results [00:37:10]. It uses generateText with two tools: searchWeb and evaluate.

searchWeb tool: Invokes the searchWeb function, adds results to a pendingSearchResults array, and returns them to the model context [00:38:57].
evaluate tool: Pulls the latest pending result, uses generateObject in enum mode (relevant or irrelevant) to assess its relevance, and pushes relevant results to finalSearchResults [00:39:41]. If irrelevant, it prompts the model to search again with a more specific query, triggering the maxSteps loop [00:40:56].

It also checks accumulatedSources to avoid reusing sources that have already been evaluated [00:52:54].

// ... (imports)
import { generateText, tool, generateObject } from 'ai';
import { z } from 'zod';
 
async function searchAndProcess(
  query: string,
  accumulatedSources: SearchResult[]
): Promise<SearchResult[]> {
  const pendingSearchResults: SearchResult[] = [];
  const finalSearchResults: SearchResult[] = [];
 
  const { text, toolResults, steps } = await generateText({
    model: mainModel,
    prompt: `Search the web for information about: ${query}`,
    system: `You are a researcher. For each query, search the web and then evaluate if the results are relevant and will help answer the following query. If the page already exists in the existing results, mark it as irrelevant.`,
    maxSteps: 5, // Max attempts to find relevant results
    tools: {
      searchWeb: tool({
        description: 'Searches the web for information about a given query.',
        parameters: z.object({ query: z.string() }),
        execute: async ({ query }) => {
          const results = await searchWeb(query);
          pendingSearchResults.push(...results);
          return { message: `Search results added to pending list.` };
        },
      }),
      evaluate: tool({
        description: 'Evaluates the search results to determine relevance and extract learnings.',
        parameters: z.object({}), // No parameters needed, takes from internal state
        execute: async () => {
          const latestResult = pendingSearchResults.pop();
          if (!latestResult) {
            return { message: 'No pending search results to evaluate.' };
          }
 
          // Check if source already used
          if (accumulatedSources.some(s => s.url === latestResult.url)) {
            return 'irrelevant'; // Marked as irrelevant if already used
          }
 
          const evaluation = await generateObject({
            model: mainModel,
            prompt: `Evaluate whether the search results are relevant and will help answer the following query: "${query}"
            
            <searchResult>
            ${JSON.stringify(latestResult)}
            </searchResult>
            `,
            schema: z.enum(['relevant', 'irrelevant']),
          });
 
          if (evaluation.object === 'relevant') {
            finalSearchResults.push(latestResult);
            return 'relevant';
          } else {
            return 'irrelevant: Please search again with a more specific query.';
          }
        },
      }),
    },
  });
 
  return finalSearchResults;
}

[00:36:49]

Step 4: Generate Learnings and Follow-up Questions

After obtaining relevant search results, the generateLearnings function extracts key learnings (insights) and generates follow-up questions [00:43:32]. This also uses generateObject for a structured output:

// ... (imports)
type Learning = {
  learning: string;
  followUpQuestions: string[];
};
 
async function generateLearnings(query: string, searchResult: SearchResult): Promise<Learning> {
  const { object } = await generateObject({
    model: mainModel,
    prompt: `The user is researching: "${query}". The following search results were deemed relevant. Generate a learning and a follow-up question from the following search result.
    
    <searchResult>
    ${JSON.stringify(searchResult)}
    </searchResult>`,
    schema: z.object({
      learning: z.string(),
      followUpQuestions: z.array(z.string()),
    }),
  });
  return object;
}

[00:43:52]

Step 5: Incorporating Recursion and State

To enable deep research, the entire process is wrapped in a deepResearch function that calls itself recursively [00:47:10]. A global ResearchState object tracks the accumulated research (original query, active queries, search results, learnings, completed queries) [00:48:10].

// ... (imports and types)
 
type ResearchState = {
  originalQuery: string;
  queries: string[];
  searchResults: SearchResult[];
  learnings: Learning[];
  completedQueries: string[];
};
 
let accumulatedResearch: ResearchState = {
  originalQuery: '',
  queries: [],
  searchResults: [],
  learnings: [],
  completedQueries: [],
};
 
async function deepResearch(
  prompt: string,
  query: string | undefined = undefined,
  depth: number = 2, // How many levels deep to go
  breadth: number = 2 // How many subqueries at each level
) {
  if (depth === 0) {
    return accumulatedResearch; // Base case for recursion
  }
 
  if (query === undefined) {
    query = prompt;
    accumulatedResearch.originalQuery = prompt;
  }
 
  const queriesToSearch = await generateSearchQueries(query, breadth);
  accumulatedResearch.queries.push(...queriesToSearch);
 
  for (const q of queriesToSearch) {
    console.log(`Searching the web for: ${q}`);
    const results = await searchAndProcess(q, accumulatedResearch.searchResults);
    accumulatedResearch.searchResults.push(...results);
    accumulatedResearch.completedQueries.push(q);
 
    for (const result of results) {
      console.log(`Processing search result from: ${result.url}`);
      const learning = await generateLearnings(q, result);
      accumulatedResearch.learnings.push(learning);
 
      // Recursive call for follow-up questions
      if (learning.followUpQuestions.length > 0 && depth > 1) {
        for (const followUp of learning.followUpQuestions.slice(0, breadth)) {
          console.log(`Going deeper with follow-up: ${followUp}`);
          await deepResearch(prompt, followUp, depth - 1, breadth); // Decrement depth
        }
      }
    }
  }
  return accumulatedResearch;
}

[00:47:07]

Step 6: Generate Report

Finally, the generateReport function takes the accumulatedResearch and feeds it to a large language model (e.g., GPT-4o mini) to synthesize all the information into a cohesive report [00:54:21]. A system prompt is vital for defining the persona (expert researcher) and specifying formatting (e.g., Markdown) to ensure a structured and high-quality output [00:57:22]. The final report is then written to a Markdown file [00:55:49].

// ... (imports)
import * as fs from 'fs/promises';
 
async function generateReport(research: ResearchState): Promise<string> {
  const { text: report } = await generateText({
    model: openai('gpt-4o-mini'), // Or a reasoning model like GPT-4o
    system: `You are an expert researcher. Today's date is ${new Date().toLocaleDateString()}. Follow these instructions exactly:
    - Use Markdown formatting.
    - Structure the report clearly with headings and subheadings.
    - Summarize key findings concisely.
    - Include relevant data and insights.
    - You may use high levels of speculation or prediction, but flag it clearly.`,
    prompt: `Generate a comprehensive report based on the following research data about "${research.originalQuery}":
 
    <researchData>
    Original Query: ${research.originalQuery}
    Completed Queries: ${JSON.stringify(research.completedQueries, null, 2)}
    Search Results: ${JSON.stringify(research.searchResults.map(r => ({ url: r.url, title: r.title })), null, 2)}
    Learnings: ${JSON.stringify(research.learnings, null, 2)}
    </researchData>
    `,
  });
  return report;
}
 
async function main() {
  const initialPrompt = 'What do you need to be a D1 shotput athlete?';
  console.log(`Starting deep research for: ${initialPrompt}`);
  const finalResearch = await deepResearch(initialPrompt, undefined, 2, 2); // Depth 2, Breadth 2
 
  console.log('Generating report...');
  const reportContent = await generateReport(finalResearch);
 
  await fs.writeFile('research_report.md', reportContent);
  console.log('Report saved to research_report.md');
}
 
main();

[00:54:47]

This detailed workflow demonstrates how to combine various components of AI agents and functions within the AI SDK to build complex, autonomous systems that can perform sophisticated tasks like deep research [00:25:22].

Tubegraph

Explorer

Table of Contents