From: aidotengineer
The AI SDK is a toolkit designed for building agents [00:00:07] and developing AI applications [00:00:02]. This article explores its core functionalities, focusing on how language models can be leveraged to generate text, perform actions via tools, and produce structured data.
Fundamentals
The AI SDK offers fundamental building blocks necessary for interacting with language models [00:00:17]. The project typically involves a single index.ts
file, runnable via pmpp rundev
[00:00:46].
generateText
Function
The generateText
function is the primary way to call a large language model to generate text [00:01:06].
It requires specifying a model, such as OpenAI’s GPT-4o mini, and a prompt [00:01:41]. For instance, prompting “hello world” to GPT-4o mini might return “Hello, how can I assist you today?” [00:01:48]. This function can accept either a direct string prompt or an array of messages, where each message has a role and content [00:02:07].
Unified Interface
A core feature of the AI SDK is its unified interface, allowing developers to switch between different language models by changing a single line of code [00:02:37]. This flexibility is beneficial for reasons such as cost-efficiency, speed, or performance in specific use cases [00:02:50].
Models like GPT-4o Mini have training data cutoffs (e.g., 2024), limiting their knowledge of future events [00:03:16]. For information requiring real-time data, models with built-in web search capabilities, such as Perplexity’s Sonar Pro, can be used [00:03:52]. Perplexity’s responses include references or sources, accessible via the sources
property in the AI SDK [00:04:40]. Google’s Gemini Flash 1.5 also supports search grounding for similar functionality [00:05:33]. The SDK supports a wide range of providers, detailed in its documentation [00:05:10].
Tools and Function Calling
Beyond generating text, language models can interact with the outside world and perform actions through “tools” or “function calling” [00:06:15].
How Tools Work
The fundamental idea behind tools is simple: the model is given a prompt along with a list of available tools [00:06:26]. Each tool includes:
- A
name
[00:06:38] - A
description
of what it does, which the model uses to decide when to invoke it [00:06:40] parameters
(data required to use the tool) [00:06:47]- An
execute
function, which is arbitrary asynchronous JavaScript code run when the language model calls the tool [00:08:33].
When the model decides to use a tool, it generates a “tool call” (the tool’s name and arguments parsed from the conversation context) instead of text [00:06:53]. The developer then parses and runs this tool call [00:07:13]. The AI SDK automatically parses tool calls, invokes the execute
function, and returns the result in a toolResults
array [00:09:24].
The tool
utility function (e.g., tool('addNumbers', ...)
) provides type safety between defined parameters and the execute
function’s arguments [00:08:06].
maxSteps
for Autonomous Agents
By default, after a tool call, the language model might not generate text [00:10:05]. To get the model to incorporate tool results into a text answer, the maxSteps
property can be used [00:11:35]. When maxSteps
is set, if the model generates a tool call, the tool result is sent back to the model along with the previous conversation context, triggering another generation [00:11:46]. This continues until the model generates plain text or the maxSteps
threshold is reached [00:12:05]. This enables the model to run autonomously, picking the next step without explicit developer logic [00:12:16].
For example, asking “what’s 10 + 5?” with an “add numbers” tool will result in a tool call and then a subsequent text generation of “10 + 5 equals 15” if maxSteps
is configured [00:13:00].
Models can even make parallel tool calls within a single step [00:16:42]. For instance, getting weather in two cities (“San Francisco” and “New York”) and then adding them together can involve two parallel “get weather” calls followed by an “add numbers” call [00:15:45]. The language model can infer parameters like latitude and longitude from the city names provided in the prompt, using its training data [00:15:05].
Structured Outputs
The AI SDK provides ways to generate structured data, also known as structured outputs [00:18:38].
Methods for Structured Output
There are two primary ways to generate structured outputs:
- Using
generateText
with itsexperimentalOutput
option [00:18:46]. - Using the dedicated
generateObject
function [00:18:55]. ThestreamObject
function is also available for streaming structured output [00:19:00].
Zod for Schema Definition
Language models can be guided to output structured data using schema definitions [00:19:46]. Zod, a TypeScript validation library, is highly recommended for defining these schemas in the AI SDK [00:19:47]. Zod schemas allow for type-safe objects as output, ensuring data integrity [00:20:42].
The .describe()
function in Zod can be chained onto any key or value within the schema to provide detailed instructions to the language model about the desired output for that specific value [00:23:14]. This allows for fine-grained control over the generated text within the structured output [00:23:28].
generateObject
can also use an enum mode for schemas with a limited set of discrete values (e.g., “relevant” or “irrelevant”), simplifying the schema definition [00:40:08]. This constraint makes it easier for the language model to produce the correct output [00:40:31].
Practical Project: Building a Deep Research Clone
This section demonstrates how to combine the AI SDK’s functions to build complex AI systems, specifically a “deep research clone” in Node.js [00:25:24]. Deep research tools generally take a topic, search the web, aggregate resources, and return a comprehensive report [00:26:12].
Workflow Breakdown
The deep research workflow involves several steps [00:26:50]:
- Input Query: Start with a general prompt [00:26:53].
- Generate Subqueries: Generate multiple specific search queries from the initial prompt [00:26:57].
- Search Web: For each subquery, search the web for relevant results [00:27:15].
- Analyze Results: Analyze search results for “learnings” (insights) and “follow-up questions” [00:27:19].
- Recursive Research: Take follow-up questions and existing research to generate new queries, recursively continuing the process while accumulating information [00:27:26].
This recursive process explores “webs of thought” by adjusting “depth” (levels of inquiry) and “breadth” (number of queries at each step) [00:27:45].
Implementation Details
generateSearchQueries
This function uses generateObject
to create an array of search queries based on an initial prompt and a desired number of queries [00:29:41]. It defines a Zod schema for an array of strings to ensure structured output [00:30:38].
searchWeb
with Exa
The searchWeb
function uses the Exa service to perform web searches and retrieve content [00:32:53]. It fetches a specified number of results and can use liveCrawl
to ensure up-to-date information [00:33:59]. Importantly, it maps through the results and only returns relevant information (e.g., URL, title, content) to reduce token count and improve language model effectiveness [00:34:49].
searchAndProcess
(Agentic Component)
This is the most complex and agentic part of the workflow [00:36:10]. It employs generateText
with two tools:
searchWeb
: Performs a web search [00:38:57]. Search results are temporarily stored inpendingSearchResults
[00:39:21].evaluate
: Evaluates the relevance of the latest search result usinggenerateObject
in enum mode (relevant
orirrelevant
) [00:39:41]. If relevant, it moves the result tofinalSearchResults
; otherwise, it discards it [00:40:41].
The maxSteps
parameter ensures this agentic loop continues until a relevant source is found or the step limit is reached [00:38:27]. The model receives feedback (“Search results are irrelevant, please search again with a more specific query”) to guide its next action [00:41:01]. To prevent redundant searches, the evaluate
tool also receives a list of accumulatedSources
and marks already-used URLs as irrelevant [00:52:50].
generateLearnings
This function uses generateObject
to analyze relevant search results and extract key “learnings” (insights) and “follow-up questions” [00:43:39]. It defines a Zod schema for a string learning and an array of string follow-up questions [00:44:15].
Deep Research Recursion
The overall deepResearch
function handles the recursion, managing the “accumulated research” state [00:47:10]. This state includes the original query, active queries, search results, learnings, and completed queries [00:48:10]. The function decrements depth
and breadth
parameters in each recursive call to ensure termination and control the extent of the research [00:50:01].
generateReport
Finally, the generateReport
function takes the accumulatedResearch
and feeds it into a large language model (e.g., GPT-3.5 mini) to synthesize all the gathered information [00:54:32]. A detailed system prompt can be provided to guide the model on formatting (e.g., Markdown) and tone (e.g., “expert researcher”), ensuring the output meets specific requirements and is more structured [00:57:22]. The final report can then be written to a file system [00:55:49].