From: aidotengineer
The AI SDK provides powerful primitives for building AI applications, including generating text, working with structured data, and integrating external tools. This article explores these core functionalities, focusing on tool calling and structured outputs, culminating in a practical example of building a deep research agent.
AI SDK Fundamentals
The AI SDK is designed to simplify interactions with large language models (LLMs) and enable the creation of intelligent agents.
Generating Text with generateText
The generateText
function is a fundamental primitive in the AI SDK for interacting with LLMs to produce text outputs [01:06:00].
Key features:
- Model Specification: Users can specify the LLM they want to use, such as OpenAI’s GPT-4o mini [01:41:00].
- Input: It accepts either a simple
prompt
string or an array ofmessages
, where each message has arole
andcontent
[02:07:00]. - Unified Interface: A core feature of the AI SDK is its unified interface, which allows developers to switch between different language models (e.g., OpenAI, Perplexity, Google Gemini) by changing a single line of code [02:40:00]. This flexibility is valuable for optimizing costs, speed, or performance for specific use cases [02:50:00].
Models with built-in web search, like Perplexity’s Sonar Pro or Google’s Gemini, can directly answer questions requiring up-to-date information, bypassing the need for external tools in some cases [03:53:00]. Responses from these models often include sources
that can be accessed via the sources
property in the AI SDK result [04:48:00].
Tools and Function Calling
Tools, also known as function calling, allow language models to interact with the outside world and perform actions beyond text generation [06:14:00].
Core Idea
The model is given a prompt along with a list of available tools. Each tool includes:
- Name: A unique identifier for the tool [06:40:00].
- Description: What the tool does, which helps the model decide when to use it [06:42:00].
- Parameters: Any data the tool requires, which the model attempts to parse from the conversation context [06:47:00].
- Execute Function: An arbitrary asynchronous JavaScript code block that runs when the model generates a tool call [08:33:00].
When the model decides to use a tool, it generates a “tool call” (the tool’s name and arguments) instead of plain text [06:58:00]. The developer is then responsible for parsing and executing this tool call [07:13:00]. The AI SDK automatically invokes the execute
function and returns its output in a toolResults
array [09:23:23].
Enabling Multi-Step Agents with maxSteps
By default, if an LLM generates a tool call, the generateText
function returns the tool result, not a synthesized text response [10:05:00]. To allow the model to incorporate tool results into a text answer, the maxSteps
property can be used [11:35:00].
The maxSteps
property enables an “agentic loop”:
- If the model generates a tool call, the tool result is sent back to the model along with the previous conversation context.
- This triggers another generation step.
- The process continues until the model generates plain text (no tool call) or the
maxSteps
threshold is reached [12:05:00].
This feature allows the model to autonomously pick the next step in a process without requiring manual rerouting of output [12:16:00]. Parallel tool calls are also supported, allowing multiple tools to be invoked simultaneously within a single step [16:42:00].
Generating Structured Data (Structured Outputs)
Beyond text generation, the AI SDK facilitates generating structured data.
Methods for Structured Outputs
generateText
withexperimentalOutput
: An experimental option withingenerateText
that allows defining an output schema [18:46:00].generateObject
: A dedicated function for generating structured outputs [18:55:00]. This function is particularly useful as it ensures type-safe JSON output based on a defined schema [22:22:00].
ZOD Integration
ZOD, a TypeScript validation library, is heavily used with the AI SDK to define schemas for structured outputs. This integration provides:
- Type Safety: Ensures the generated output conforms to the expected types [19:50:00].
- Schema Description: The
.describe()
function in ZOD allows adding detailed instructions to the LLM about the desired format or content for specific keys or values within the schema [23:14:00]. - Enum Mode:
generateObject
can operate in an enum mode, restricting outputs to a predefined set of values (e.g., “relevant” or “irrelevant”), simplifying evaluation logic [40:15:00].
Practical Project: Building a Deep Research Clone
To demonstrate these concepts, a “deep research clone” can be built as a Node.js terminal script. This project showcases how to break down complex tasks into a structured workflow, integrate multiple AI SDK functions, and create autonomous agents [24:37:00].
Workflow Overview
The deep research process involves several steps [26:52:00]:
- Input Query: Start with a broad research query [26:53:00].
- Generate Subqueries: Create multiple specific search queries from the main prompt [26:57:00].
- Search the Web: For each subquery, search for relevant results [27:15:00].
- Analyze Results: Extract key learnings and identify follow-up questions from the search results [27:19:00].
- Recursive Depth: If more depth is needed, use follow-up questions to generate new queries, recursively repeating the process while accumulating all research [27:26:00].
- Depth: Controls how many levels deep the research goes [29:06:00].
- Breadth: Controls how many separate lines of inquiry are pursued at each level [28:12:00].
Implementation Details
1. generateSearchQueries
This function takes a main query
and the desired number
of sub-queries. It uses generateObject
with a schema to ensure an array of strings is returned, optimized for search engine queries [29:46:00].
2. searchWeb
This function uses the Exa API to perform web searches and retrieve content [32:52:00].
- Configuration: Allows specifying the number of results and using
liveCrawl
for real-time data [33:59:00]. - Data Trimming: Crucially, it processes results to return only relevant information (e.g.,
url
,title
,content
), reducing token count for LLMs and improving model effectiveness [34:49:00].
3. searchAndProcess
(The Agentic Component)
This is the core deep research with AI SDK agent that manages finding relevant search results [36:14:00]. It uses generateText
with two tools:
searchWeb
tool: Invokes thesearchWeb
function with a given query [38:57:00].evaluate
tool: Assesses the relevance of a search result. It pulls the latest pending result, usesgenerateObject
(in enum mode for “relevant” or “irrelevant”) to get the model’s judgment, and then either adds the result tofinalSearchResults
or instructs the model to search again with a more specific query if irrelevant [39:41:00].
The maxSteps
property keeps this agentic loop running until a relevant source is found or the step limit is reached [38:27:00]. It also checks accumulatedSources
to avoid reusing previously processed links [52:54:00].
4. generateLearnings
This function takes a query
and searchResult
and uses generateObject
to extract a learning
(insight) and followUpQuestions
(an array of strings) from the content, guided by a specific prompt [43:39:00].
5. deepResearch
(Recursion Handler)
This function orchestrates the entire recursive research process [47:07:00]. It maintains a global accumulatedResearch
state to track original queries, active queries, search results, learnings, and completed queries [47:52:00].
- It calls
generateSearchQueries
,searchAndProcess
, andgenerateLearnings
iteratively. - It decrements
depth
andbreadth
parameters with each recursive call to control the research scope and prevent infinite loops [50:01:00]. - For follow-up questions, it constructs a new query based on the overall goal, previous queries, and new follow-up questions, then recursively calls
deepResearch
[50:43:00].
6. generateReport
Finally, after the deepResearch
process completes, the generateReport
function takes the accumulatedResearch
and uses generateText
with a large reasoning model (e.g., GPT-4o mini) to synthesize all the gathered information into a comprehensive report [54:42:00].
- System Prompt: A detailed system prompt is provided to guide the model on persona (“expert researcher”), formatting (Markdown), and content guidelines (e.g., allowing speculation if flagged) [57:22:00].
- The final report is then written to a markdown file [55:49:00].
This leveraging AI tools for efficiency and scalability project demonstrates the power of combining AI SDK’s primitives to build complex, autonomous AI systems capable of tasks like deep research [59:06:00].