From: aidotengineer

The AI SDK provides powerful primitives for building AI applications, including generating text, working with structured data, and integrating external tools. This article explores these core functionalities, focusing on tool calling and structured outputs, culminating in a practical example of building a deep research agent.

AI SDK Fundamentals

The AI SDK is designed to simplify interactions with large language models (LLMs) and enable the creation of intelligent agents.

Generating Text with generateText

The generateText function is a fundamental primitive in the AI SDK for interacting with LLMs to produce text outputs [01:06:00].

Key features:

  • Model Specification: Users can specify the LLM they want to use, such as OpenAI’s GPT-4o mini [01:41:00].
  • Input: It accepts either a simple prompt string or an array of messages, where each message has a role and content [02:07:00].
  • Unified Interface: A core feature of the AI SDK is its unified interface, which allows developers to switch between different language models (e.g., OpenAI, Perplexity, Google Gemini) by changing a single line of code [02:40:00]. This flexibility is valuable for optimizing costs, speed, or performance for specific use cases [02:50:00].

Models with built-in web search, like Perplexity’s Sonar Pro or Google’s Gemini, can directly answer questions requiring up-to-date information, bypassing the need for external tools in some cases [03:53:00]. Responses from these models often include sources that can be accessed via the sources property in the AI SDK result [04:48:00].

Tools and Function Calling

Tools, also known as function calling, allow language models to interact with the outside world and perform actions beyond text generation [06:14:00].

Core Idea

The model is given a prompt along with a list of available tools. Each tool includes:

  • Name: A unique identifier for the tool [06:40:00].
  • Description: What the tool does, which helps the model decide when to use it [06:42:00].
  • Parameters: Any data the tool requires, which the model attempts to parse from the conversation context [06:47:00].
  • Execute Function: An arbitrary asynchronous JavaScript code block that runs when the model generates a tool call [08:33:00].

When the model decides to use a tool, it generates a “tool call” (the tool’s name and arguments) instead of plain text [06:58:00]. The developer is then responsible for parsing and executing this tool call [07:13:00]. The AI SDK automatically invokes the execute function and returns its output in a toolResults array [09:23:23].

Enabling Multi-Step Agents with maxSteps

By default, if an LLM generates a tool call, the generateText function returns the tool result, not a synthesized text response [10:05:00]. To allow the model to incorporate tool results into a text answer, the maxSteps property can be used [11:35:00].

The maxSteps property enables an “agentic loop”:

  1. If the model generates a tool call, the tool result is sent back to the model along with the previous conversation context.
  2. This triggers another generation step.
  3. The process continues until the model generates plain text (no tool call) or the maxSteps threshold is reached [12:05:00].

This feature allows the model to autonomously pick the next step in a process without requiring manual rerouting of output [12:16:00]. Parallel tool calls are also supported, allowing multiple tools to be invoked simultaneously within a single step [16:42:00].

Generating Structured Data (Structured Outputs)

Beyond text generation, the AI SDK facilitates generating structured data.

Methods for Structured Outputs

  1. generateText with experimentalOutput: An experimental option within generateText that allows defining an output schema [18:46:00].
  2. generateObject: A dedicated function for generating structured outputs [18:55:00]. This function is particularly useful as it ensures type-safe JSON output based on a defined schema [22:22:00].

ZOD Integration

ZOD, a TypeScript validation library, is heavily used with the AI SDK to define schemas for structured outputs. This integration provides:

  • Type Safety: Ensures the generated output conforms to the expected types [19:50:00].
  • Schema Description: The .describe() function in ZOD allows adding detailed instructions to the LLM about the desired format or content for specific keys or values within the schema [23:14:00].
  • Enum Mode: generateObject can operate in an enum mode, restricting outputs to a predefined set of values (e.g., “relevant” or “irrelevant”), simplifying evaluation logic [40:15:00].

Practical Project: Building a Deep Research Clone

To demonstrate these concepts, a “deep research clone” can be built as a Node.js terminal script. This project showcases how to break down complex tasks into a structured workflow, integrate multiple AI SDK functions, and create autonomous agents [24:37:00].

Workflow Overview

The deep research process involves several steps [26:52:00]:

  1. Input Query: Start with a broad research query [26:53:00].
  2. Generate Subqueries: Create multiple specific search queries from the main prompt [26:57:00].
  3. Search the Web: For each subquery, search for relevant results [27:15:00].
  4. Analyze Results: Extract key learnings and identify follow-up questions from the search results [27:19:00].
  5. Recursive Depth: If more depth is needed, use follow-up questions to generate new queries, recursively repeating the process while accumulating all research [27:26:00].
    • Depth: Controls how many levels deep the research goes [29:06:00].
    • Breadth: Controls how many separate lines of inquiry are pursued at each level [28:12:00].

Implementation Details

1. generateSearchQueries

This function takes a main query and the desired number of sub-queries. It uses generateObject with a schema to ensure an array of strings is returned, optimized for search engine queries [29:46:00].

2. searchWeb

This function uses the Exa API to perform web searches and retrieve content [32:52:00].

  • Configuration: Allows specifying the number of results and using liveCrawl for real-time data [33:59:00].
  • Data Trimming: Crucially, it processes results to return only relevant information (e.g., url, title, content), reducing token count for LLMs and improving model effectiveness [34:49:00].

3. searchAndProcess (The Agentic Component)

This is the core deep research with AI SDK agent that manages finding relevant search results [36:14:00]. It uses generateText with two tools:

  • searchWeb tool: Invokes the searchWeb function with a given query [38:57:00].
  • evaluate tool: Assesses the relevance of a search result. It pulls the latest pending result, uses generateObject (in enum mode for “relevant” or “irrelevant”) to get the model’s judgment, and then either adds the result to finalSearchResults or instructs the model to search again with a more specific query if irrelevant [39:41:00].

The maxSteps property keeps this agentic loop running until a relevant source is found or the step limit is reached [38:27:00]. It also checks accumulatedSources to avoid reusing previously processed links [52:54:00].

4. generateLearnings

This function takes a query and searchResult and uses generateObject to extract a learning (insight) and followUpQuestions (an array of strings) from the content, guided by a specific prompt [43:39:00].

5. deepResearch (Recursion Handler)

This function orchestrates the entire recursive research process [47:07:00]. It maintains a global accumulatedResearch state to track original queries, active queries, search results, learnings, and completed queries [47:52:00].

  • It calls generateSearchQueries, searchAndProcess, and generateLearnings iteratively.
  • It decrements depth and breadth parameters with each recursive call to control the research scope and prevent infinite loops [50:01:00].
  • For follow-up questions, it constructs a new query based on the overall goal, previous queries, and new follow-up questions, then recursively calls deepResearch [50:43:00].

6. generateReport

Finally, after the deepResearch process completes, the generateReport function takes the accumulatedResearch and uses generateText with a large reasoning model (e.g., GPT-4o mini) to synthesize all the gathered information into a comprehensive report [54:42:00].

  • System Prompt: A detailed system prompt is provided to guide the model on persona (“expert researcher”), formatting (Markdown), and content guidelines (e.g., allowing speculation if flagged) [57:22:00].
  • The final report is then written to a markdown file [55:49:00].

This leveraging AI tools for efficiency and scalability project demonstrates the power of combining AI SDK’s primitives to build complex, autonomous AI systems capable of tasks like deep research [59:06:00].