From: aidotengineer

Function calling, or “tools,” enables large language models (LLMs) to interact with the outside world and perform actions beyond just generating text [00:06:17]. This capability is a core feature of the AI SDK [00:02:22] [00:02:22].

Core Concept of Tools

The fundamental idea behind tools is simple: you provide an LLM with a prompt and a list of available tools as part of the conversation context [00:06:29]. Each tool includes:

  • Name: A unique identifier for the tool [00:06:40].
  • Description: A clear explanation of what the tool does, which the model uses to decide when to invoke it [00:06:42] [00:08:13].
  • Parameters: Any data required by the tool for its operation, which the model attempts to parse from the conversation context [00:06:47] [00:08:22].

Instead of generating a text response, the model will generate a “tool call” if it decides to use a tool to solve a user’s query [00:06:53]. This tool call includes the tool’s name and the arguments parsed from the conversation context [00:07:01]. The developer is then responsible for parsing that tool call, running the associated code, and handling the results [00:07:13].

Implementing Tools with AI SDK

generateText with Tools

The generateText or streamText functions in the AI SDK accept a tools object as input [00:07:51]. Within this object, you define each tool using the tool utility function [00:08:06].

The tool utility function provides:

  • description: Explains the tool’s purpose to the model [00:08:13].
  • parameters: Defines the data schema required by the tool. This can be defined using Zod, a TypeScript validation library, which ensures type safety between the defined parameters and the arguments received by the execute function [00:08:22] [00:08:54] [00:19:47].
  • execute: An asynchronous JavaScript function that contains the arbitrary code to be run when the language model generates a tool call [00:08:33].

The AI SDK automatically parses any tool calls, invokes the execute function, and returns the result in a toolResults array [00:09:24].

import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
 
async function main() {
  const { toolResults } = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: "What's 10 + 5?",
    tools: {
      addNumbers: tool({
        description: 'Adds two numbers together.',
        parameters: {
          type: 'object',
          properties: {
            num1: { type: 'number' },
            num2: { type: 'number' },
          },
          required: ['num1', 'num2'],
        },
        execute: async ({ num1, num2 }) => num1 + num2,
      }),
    },
  });
 
  console.log(toolResults); // Output: [{ toolName: 'addNumbers', args: { num1: 10, num2: 5 }, result: 15 }]
}
 
main();

[00:07:27]

Building Agents with AI SDK using maxSteps

When a model generates a tool call, it typically doesn’t generate subsequent text [00:10:05]. To allow the model to incorporate tool results into a generated text answer or perform multiple actions, the maxSteps property is used [00:11:35].

maxSteps enables an “agentic loop” [00:12:16]:

  1. If the model generates a tool call, the toolResult is sent back to the model along with the previous conversation context [00:11:50].
  2. This triggers another generation, allowing the model to continue autonomously until it either generates plain text or reaches the maxSteps threshold [00:12:03].

This allows the model to pick the next step in a process without requiring complex manual logic or rerouting [00:12:21].

Example: Parallel Tool Calls and Multi-Step Reasoning

A more complex example demonstrates an agent using multiple tools, potentially in parallel, and then synthesizing the results [00:13:50]. For instance, asking for weather in multiple cities and then adding temperatures together [00:15:18]. The model can infer missing parameters (like latitude/longitude for a city) from the context [00:15:05].

import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
 
async function main() {
  const { text, toolResults } = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: "Get the weather in San Francisco and New York and then add them together.",
    maxSteps: 3, // Allows multiple steps of tool calls and text generation
    tools: {
      getWeather: tool({
        description: 'Get the current weather at a location.',
        parameters: {
          type: 'object',
          properties: {
            latitude: { type: 'number' },
            longitude: { type: 'number' },
            city: { type: 'string' }, // Model can infer lat/long from city
          },
          required: ['city'],
        },
        execute: async ({ city, latitude, longitude }) => {
          // Simulate weather API call
          if (city === 'San Francisco') return { temperature: 12.3 };
          if (city === 'New York') return { temperature: 15.2 };
          return { temperature: 0 };
        },
      }),
      addNumbers: tool({
        description: 'Adds two numbers together.',
        parameters: {
          type: 'object',
          properties: {
            num1: { type: 'number' },
            num2: { type: 'number' },
          },
          required: ['num1', 'num2'],
        },
        execute: async ({ num1, num2 }) => num1 + num2,
      }),
    },
  });
 
  console.log(text); // Expected: "The current temperature in San Francisco is 12.3° C. In New York, it's 15.2. When you add these temperatures together, you get 27.5 degrees CC."
  console.log(toolResults); // Will show sequence of tool calls and their results
}
 
main();

[00:14:10] [00:15:43]

Generating Structured Data (Structured Outputs)

Beyond text, the AI SDK allows models to generate structured data, which is useful for programmatic integration [00:18:38].

generateText with experimental_output

The generateText function includes an experimental_output option to define the desired structure using a Zod schema [00:18:46] [00:19:30].

import { generateText, tool, experimental_output } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
 
async function main() {
  const { experimental_output: result } = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: "Get the weather in San Francisco and New York and then add them together.",
    maxSteps: 3,
    tools: { /* ... (same tools as above) ... */ },
    experimental_output: experimental_output(
      z.object({
        sum: z.number().describe('The sum of the temperatures.'),
      })
    ),
  });
 
  console.log(result.sum); // Direct access to the sum as a number
}
 
main();

[00:19:18] [00:20:07]

generateObject Function

The generateObject function is specifically dedicated to structured outputs and is highly favored for this purpose [00:18:55]. It simplifies defining the output schema.

import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
 
async function main() {
  const { object } = await generateObject({
    model: openai('gpt-4o-mini'),
    prompt: 'Please come up with 10 definitions for AI agents.',
    schema: z.object({
      definitions: z.array(z.string().describe('Each definition should be completely incoherent and use as much jargon as possible.')),
    }),
  });
 
  console.log(object.definitions);
}
 
main();

[00:21:00] [00:22:09] [00:23:14]

The ZOD.describe function can be chained to any Zod schema element to provide detailed instructions to the language model about the desired characteristics of that specific value [00:23:17].

generateObject also supports an “enum mode” for when the output needs to be one of a few predefined values (e.g., ‘relevant’ or ‘irrelevant’), making it very ergonomic and easier for the model [00:40:15].

Developing Custom AI Tools and Functions in Complex Systems

Integration of AI coding agents with third-party tools is crucial for building complex AI systems. The principles of tool calling and structured outputs are foundational for advanced applications, such as a “deep research clone” [00:24:39] [00:25:24].

Example: Deep Research Clone Workflow

A deep research clone workflow exemplifies using reasoning models and tool calls in AI in a structured way [00:26:50]:

  1. Generate Subqueries: An initial query is broken down into multiple search queries using generateObject [00:29:48].
  2. Search the Web: A searchWeb function (e.g., using Exa API) retrieves relevant results for each subquery [00:32:53]. It’s crucial to filter extraneous information from search results to reduce token count and improve model effectiveness [00:35:02].
  3. Analyze and Process: The searchAndProcess function acts as an agent, using generateText with maxSteps and two key tools:
    • searchWeb: Performs the actual web search.
    • evaluate: Determines if the search results are relevant. This tool can also prevent re-using already processed sources [00:39:39] [00:52:22]. Large search results should be handled as local variables rather than tool parameters to avoid unnecessary token consumption [00:41:30].
  4. Generate Learnings: The generateLearnings function uses generateObject to extract insights (“learnings”) and “follow-up questions” from relevant search results [00:43:39].
  5. Recursion and State Management: The entire process is encapsulated in a recursive deepResearch function [00:46:13]. Accumulated research (original query, active queries, search results, learnings, completed queries) is stored in a mutable state object, allowing the model to go deeper into sub-topics while retaining context [00:46:53].
  6. Generate Report: Finally, a generateReport function uses a larger reasoning model (e.g., GPT-4o-mini) to synthesize all accumulated research into a comprehensive report [00:54:26]. Providing a detailed system prompt (e.g., persona, date, markdown formatting, guidelines) significantly improves the quality and structure of the final output [00:57:22].

This workflow demonstrates how different AI SDK functions can be combined to build sophisticated, multi-step AI systems that solve complex problems [00:25:22] [00:26:24].