From: aidotengineer

This article explores the concepts and practical implementations of function calling and AI agents, drawing insights from a workshop led by Elan, from OpenAI’s developer experience team [00:00:05]. The core argument is that function calling is fundamental to much of the exciting advancements in AI today [00:03:14], [00:09:50].

A Brief History of Tool Use in Language Models

The evolution of language models and their ability to interact with external tools can be traced through several stages:

  • Text Completion [00:01:52]: Early models like GPT, GPT-2, and GPT-3 primarily completed text based on input [00:01:54]. Getting them to follow instructions was difficult, often requiring “few-shot” examples (e.g., “Question, Answer, Question, Answer”) [00:02:14].
  • Instruction Following (InstructGPT) [00:02:54]: OpenAI introduced InstructGPT, which enabled models to actually perform tasks specified in prompts, rather than just completing text [00:02:57].
  • Roles and Personas [00:03:03]: The introduction of “user” and “assistant” roles, achieved through post-training, allowed models to adopt specific personas [00:03:08].
  • External Tools [00:03:14]: The ability to provide models with additional tools for interacting with external states marked a significant step [00:03:17].
    • WebGPT (2021) [00:03:38]: An early version of GPT-3 was trained to use specific web search functions. This was one of the first instances where the model generated actions, which were then parsed and used to re-introduce context [00:07:05].
    • Meta’s Toolformer (2022) [00:08:00]: Meta developed a method to teach models how to use any tools, using techniques like analyzing log probabilities to strategically insert function calls [00:08:05], [00:08:21].
    • OpenAI General Function Calling (June 2023) [00:09:23]: OpenAI launched generalized function calling, where models were post-trained to use tools via a specific syntax, making it widely accessible [00:09:29].

Function Calling: A Crash Course

Function calling serves two main purposes:

  1. Fetching Data [00:10:27]: This includes reading APIs, retrieval, or accessing memory.
  2. Taking Action [00:10:30]: This involves writing to APIs, managing application state (UI, frontend, backend), or executing workflow actions [00:10:31].

How it Works

The core mechanism involves a loop [00:10:59]:

  1. You provide the model with a list of available functions and the user’s input [00:11:03].
  2. The model tells you its intent to use a function (it does not execute the function itself) [00:11:12].
  3. You are responsible for parsing the model’s intent, executing the actual code, and then providing the result back to the model [00:11:25].
  4. The model uses this result in its subsequent generation [00:11:32].

Best Practices

  • Clear Function Descriptions [00:11:51]: Explain the purpose of each parameter and use a system prompt with examples [00:12:18].
  • Software Engineering Principles [00:12:27]: Make functions intuitive and follow the principle of least privilege [00:12:31]. If a human can’t easily understand how to use it, the model might struggle too [00:12:36].
  • Enums and Object Structure [00:12:48]: Use these to prevent the model from making invalid calls or representing invalid states [00:12:50].

Functions vs. Tools

The terms “functions” and “tools” are often used interchangeably, but a distinction can be made:

  • Functions: Refer to the raw function calling interface, where you provide a schema and are responsible for execution [00:14:06].
  • Tools: A broader category that includes functions, but also hosted solutions like code interpreters or file search capabilities [00:14:18].

Agents as Loops

The fundamental concept of an AI agent is often described as a “loop” [00:20:19]. This loop involves:

  1. Specifying available tools to the model [00:20:26].
  2. Calling the model to get a message [00:20:28].
  3. Handling any tool calls the model suggests [00:20:30].
  4. Appending the results back to the conversation [00:20:33].
  5. Repeating until no more tool calls are suggested [00:20:37].

This simple loop forms the basis for more complex agent behaviors [00:20:39].

Building and Improving AI Agents

Implementing Memory

A simple form of memory can be implemented by maintaining a list of past interactions or relevant information.

Simple Memory Implementation

Memory can be as simple as a Python list [00:26:27]. Functions can be used to add_to_memory and get_memory [00:27:00]. For persistence, this memory can be stored in a local JSON file, read at the beginning, and written out after updates [00:29:42].

Example: An agent remembers a user’s height after being told, and can recall it in a later conversation even after restarting [00:31:16], [00:31:39].

For more advanced memory, one could implement “smart querying” using retrieval or semantic similarity to load only the most relevant memories [00:32:07]. To enforce consistency in stored memories, especially for contradictory information (e.g., project status changing from “not ready” to “done”), consider:

  • Doing a retrieval search for similar memories before storing a new one [01:32:11].
  • Using a model to explicitly check if a new memory updates or contradicts an existing one [01:32:21].
  • Employing timestamps for memories and creating explicit “chains of updates” (nodes pointing from older to newer information) to manage evolving data [01:32:54], [01:33:08].

Delegation

Agents can delegate tasks in several ways:

  • Handoffs [00:34:27]: A conversation is entirely swapped to a different agent, which involves replacing the system prompt and tools [00:34:30].
  • Nested Calls [00:34:37]: One function call triggers another, often overlooked for its simplicity and effectiveness [00:34:39].
  • Manager Tasks [00:34:43]: More asynchronous delegation, where a primary agent manages tasks given to other agents [00:34:46].

Delegating to a Smarter Model

An agent can be given a function, delegate_to_smarter_model, which makes an API request to a more powerful model (e.g., GPT-4) for difficult tasks [00:35:00], [00:35:36]. This allows the primary agent to offload complex computations or creative writing tasks, improving its overall capability [00:37:05], [00:38:03].

Asynchronous Operations

To prevent the user experience from being blocked while a delegated task is running, asynchronous operations are crucial.

The problem with synchronous function calls is that they block the main loop, making the user wait for all tasks to complete [00:40:49], [00:51:52].

Implementing Asynchronous Function Calling

Use asyncio.sleep (or equivalent for API calls) to simulate non-blocking operations [00:52:53]. The key is to separate user input from the background processing of messages, often using a message queue [00:47:40]. When a model makes multiple tool calls, these can be run in parallel using asyncio.gather [00:46:35].

This allows the agent to continue interacting with the user while tasks run in the background. To track these background tasks, a create_task function can generate a unique task ID, and a check_task function can retrieve the status of a specific task by its ID [00:55:17].

Asynchronous Task Management

An agent can create a task (e.g., to fetch weather for a city) [00:57:10]. While the task runs in the background (with a simulated delay), the user can continue chatting with the agent [00:57:59]. The user can then explicitly ask to check_tasks to see if the task has completed and get the result [00:58:05]. This demonstrates how an agent can manage multiple concurrent operations [01:00:34].

Advanced and Experimental Concepts

Self-Modifying Agents (Tool Writing Agents)

It’s possible to create an agent that can write its own functions and then immediately use them [01:17:16]. This involves having a function that takes a string representation of Python code, evaluates it, and adds the resulting function object to the agent’s available tools [01:22:56].

Security Risk

Using eval or exec in a production environment or with untrusted input is extremely dangerous and can lead to severe security vulnerabilities [01:19:55], [01:21:31].

Building a Calculator On-the-Fly

An agent is instructed to “make itself a little calculator” [01:25:59]. It then generates Python code for a calculate function, adds it to its tools, and can immediately use it to perform calculations requested by the user [01:26:09].

Real-Time API Tricks

For conversational AI, especially with voice interfaces, specific tricks can enhance the experience:

  • “Stay Silent” Function [01:27:46]: Use a function call to let the model decide if the user is truly done talking, even if the voice activity detection (VAD) triggers. This allows for natural pauses or interruptions without prematurely ending the model’s turn [01:27:55].
  • XML Tag Control [01:29:15]: By providing the model with a script containing XML tags (e.g., <break time="500ms"/>), it can learn to follow specific speaking styles or pauses, even if not explicitly trained for it [01:29:17], [01:30:23].

Key Considerations for Implementing AI Agents

  • Managing Numerous Functions: When dealing with dozens or hundreds of functions, consider:
    • Multi-agent architectures [01:07:53]: Split responsibilities or function groupings across different agents, invoking the correct agent for specific tasks [01:07:59]. This is a primary use case for agents and handoffs [01:15:22].
    • Fine-tuning [01:08:30]: Smaller models can be fine-tuned to work with a large number of functions (e.g., 120 functions with GPT-3.5) [01:08:35].
    • Dynamic Function Loading [01:08:51]: Based on input or conversation, load only the most relevant functions into memory or context [01:08:54].
  • Performance with Many Tools: While there are no hard limits on the number of functions or parallel calls, a general rule of thumb for reliable performance without extensive prompting is 10-20 functions [01:16:17], [01:16:46].
  • Vision Models and Thought Text: For models like GPT-4, function calls typically happen at the very end of the generation process, as the internal “chain of thought” is not exposed for direct function calls within it [01:09:36], [01:10:12].
  • Designing Agent Projects: Many frameworks exist for developing and optimizing AI agents (e.g., Swarm, Pydantic-AI) [01:16:16], [01:06:17]. However, it’s also feasible to implement a basic agent loop with minimal code (around 70 lines), allowing for granular control and avoiding unnecessary dependencies [01:06:33].

Further Exploration

Many of the discussed concepts, like memory and delegation, can be extended to significant complexity, evolving into full operating systems for managing agent behavior [01:31:55].