From: aidotengineer
The Vercel AI SDK is a versatile toolkit designed to simplify the process of building Artificial Intelligence (AI) applications and integrating large language models (LLMs) into projects [00:00:02]. It provides a unified interface, enabling developers to switch between various language models with minimal code changes [00:02:37].
This introduction covers the fundamental building blocks of the AI SDK and demonstrates how to construct a complex AI system, such as a deep research clone, using its features [00:00:15].
Core Concepts of the AI SDK
The AI SDK provides several core primitives for interacting with language models and building AI applications.
Generating Text
The generateText
function is the most fundamental primitive, allowing developers to call a large language model and receive a text response [00:01:06].
- Usage: It takes a model (e.g.,
OpenAI
’s GPT-4o mini) and a prompt as input [00:01:39]. Alternatively, it can accept an array of messages, each with a role and content [00:02:16]. - Unified Interface: A key feature of the AI SDK is its unified interface, allowing easy switching between different language models (e.g.,
OpenAI
’s GPT-4o mini, Perplexity, Google Gemini) by changing a single line of code [00:02:41]. This flexibility is beneficial for cost, speed, or use-case specific performance [00:02:50]. - Accessing Sources: When using models like Perplexity or Google Gemini that integrate web search, the AI SDK provides access to the sources used in the model’s response via the
sources
property [00:04:40]. This allows developers to see the references directly [00:04:51]. For example, a query about the “AI engineer summit in 2025” can be answered by a web-enabled model like Perplexity’s Sonar Pro [00:03:10]. Similarly, Google’s Gemini Flash 1.5 can provide search grounding for accurate results [00:05:33].
Using Tools and Function Calling
Tools, also known as function calling, extend the capabilities of language models beyond text generation, allowing them to interact with the outside world and perform actions [00:06:15].
- Concept: The model is provided with a prompt and a list of available tools [00:06:31]. Each tool has a
name
, adescription
(which the model uses to decide when to invoke it), andparameters
(data needed to use the tool) [00:06:38]. Instead of generating text, the model can generate a “tool call,” specifying the tool to use and any parsed arguments [00:07:01]. The developer is then responsible for executing that tool’s code [00:07:16]. - Defining Tools: The AI SDK simplifies tool definition using a
tool
utility function [00:08:06]. This function provides type safety between the definedparameters
and the arguments received by theexecute
function [00:08:54]. Theexecute
function can contain any arbitrary asynchronous JavaScript code [00:08:34]. - Automatic Invocation (
maxSteps
): The AI SDK automatically parses tool calls and invokes theirexecute
function, returning the result in atoolResults
array [00:09:24]. To enable the model to incorporate tool results into its final text response, themaxSteps
property can be set [00:11:35]. When a tool call is generated, themaxSteps
feature sends the tool result back to the model along with the previous conversation context, triggering another generation [00:11:50]. This allows the model to autonomously continue a process until it generates plain text or reaches themaxSteps
threshold [00:12:03]. - Example: Demonstrating tool usage, a simple
addNumbers
tool can be defined to sum two numbers [00:07:31]. The model, instead of calculating, calls the tool, and the result is returned [00:09:59]. More complex scenarios can involve multiple tools, such asgetWeather
andaddNumbers
, working in tandem, even supporting parallel tool calls [00:14:10]. The model can infer necessary parameters (like latitude and longitude for weather) from the conversation context [00:15:05].
Generating Structured Data
The AI SDK offers two primary methods for generating structured outputs, which are particularly useful for making model responses easier to consume programmatically [00:18:43].
generateText
withexperimentalOutput
: This option allowsgenerateText
to produce structured data based on a defined schema [00:18:46]. It integrates seamlessly with Zod, a TypeScript validation library, to define schemas and ensure type safety [00:19:47].generateObject
function: This is a dedicated function for structured outputs and is highly favored for its power and ease of use [00:18:55].- Zod Integration: Like
experimentalOutput
,generateObject
leverages Zod for schema definition, making it straightforward to define the desired structure of the output (e.g., an array of strings for definitions) [00:22:09]. .describe()
method: Zod’s.describe()
function can be chained onto schema definitions to provide additional context and instructions to the language model for specific values, influencing the generated content within the structured output [00:23:14].- Enum Mode: For simple cases where only a few predefined values are expected (e.g., “relevant” or “irrelevant”),
generateObject
can be used in “enum mode,” directly specifying the allowed string values instead of a full Zod schema [00:40:15]. This is ergonomic and helps constrain the model’s output [00:40:27].
- Zod Integration: Like
Building a Deep Research Clone
The AI SDK’s primitives can be combined to build complex AI systems, such as a deep research clone that autonomously gathers and synthesizes information from the web [00:24:39]. This project demonstrates an agentic workflow, combining various AI SDK functions for interesting use cases [00:25:22].
The deep research workflow involves several structured steps:
- Generate Subqueries: Based on an initial prompt, the system generates multiple specific search queries [00:26:58]. This is achieved using the
generateObject
function, which takes the main query and the desired number of subqueries, returning them as a structured array of strings [00:30:08]. - Search the Web: For each generated subquery, the system searches the web for relevant results [00:27:15]. This implementation uses the Exa search API, ensuring live crawls and efficient data retrieval [00:32:56]. Crucially, only relevant information (e.g., title, URL, content) is extracted from search results to reduce token count and improve model effectiveness [00:34:49].
- Analyze Results (
searchAndProcess
Agent): This is the most complex and agentic part of the workflow [00:36:10].- It uses
generateText
with two custom tools:searchWeb
andevaluate
[00:38:18]. - The
searchWeb
tool calls the Exa API and pushes results to apendingSearchResults
array [00:39:24]. - The
evaluate
tool pulls the latest pending result, runs it throughgenerateObject
(in enum mode for “relevant” or “irrelevant”), and provides feedback to the model [00:39:41]. If a result is irrelevant, the tool returns a message prompting the model to search again with a more specific query, effectively driving the agentic loop [00:41:01]. - To prevent redundant searches, the
evaluate
tool also checks against a list ofaccumulatedSources
(URLs) to ensure previously used pages are marked irrelevant [00:52:50]. - Local variables are used to store search results for evaluation rather than passing them as tool parameters. This avoids unnecessary token usage and processing by the model [00:41:28].
- It uses
- Generate Learnings and Follow-up Questions: Once relevant search results are obtained, the system analyzes them to extract key learnings and propose follow-up questions [00:27:20]. The
generateLearnings
function usesgenerateObject
to create a structured output containing the learning (a string) and an array offollowUpQuestions
[00:43:55]. - Recursive Research: To achieve “deep” research, the process is made recursive [00:27:31]. The
deepResearch
function manages the overall workflow, takingdepth
(levels of inquiry) andbreadth
(number of parallel inquiries at each level) parameters [00:49:09]. Follow-up questions from one level become the new queries for the next, accumulating research in a shared state until the defined depth is reached [00:46:18]. - Generate Report: Finally, all the
accumulatedResearch
(original query, queries performed, search results, learnings, completed queries) is passed to a larger reasoning model (e.g., GPT-3.5 Turbo) [00:47:52]. ThegenerateReport
function synthesizes this information into a comprehensive markdown report, which can then be written to the file system [00:54:47]. Crucially, a detailed system prompt guides the model on formatting (e.g., Markdown) and tone to ensure a high-quality, structured output [00:57:11].
This multi-step, agentic approach demonstrates how the AI SDK enables the creation of sophisticated AI applications by combining its core functions and allowing models to autonomously navigate complex tasks [00:59:08].
Resources
- AI SDK Documentation: [00:59:35]