From: aidotengineer

Function calling is presented as a fundamental concept in AI and language models, enabling them to interact with external states and perform a wide range of actions beyond mere text generation [00:00:19]. This concept is considered powerful enough to underpin much of the exciting developments in current AI [00:09:50].

Early Language Models and Instruction Following Challenges

Initially, language models like the original GPT, GPT-2, and GPT-3 operated primarily as text completion models [00:01:52]. Users provided input text, and the models would continue the sentence [00:01:59]. While innovative for generating natural-sounding language, getting these models to follow specific instructions proved difficult [00:02:12]. For example, a request like “What is the best way to get to the park?” might result in a continuation like “that is what Sally said yesterday,” rather than a direct answer [00:02:24]. To achieve instruction following, developers had to use “few-shot” prompting, structuring inputs with “question-answer” pairs [00:02:42].

Emergence of Instruction Following and Roles

A significant step forward was the introduction of InstructGPT, which enabled language models to follow instructions directly [00:02:52]. This was achieved through post-training, allowing models to understand and perform requested tasks [00:02:57]. Subsequently, the notion of “users” and “assistants” with distinct roles was introduced, also through post-training, to provide models with specific personas [00:03:02].

Early Implementations of Tooling

The concept of giving language models additional tools to interact with external states marked a crucial point in their evolution [00:03:12].

  • WebGPT (2021): One of the first instances of function calling involved WebGPT, a version of GPT-3 trained to use a specific set of functions for web search [00:03:36]. This model generated actions (like search queries) which were then parsed, executed, and their results fed back into the model’s context for further processing [00:07:06]. Training involved human users completing tasks using commands, allowing the model to imitate user behavior and produce preferred responses [00:07:17]. This approach, however, was specific to pre-defined tools [00:07:54].

  • Meta’s Approach (Any Tools): Meta developed a method to teach models how to use any tool, demonstrated with functions like QA, calculator, and translation [00:08:02]. This clever technique involved analyzing log-probabilities to determine optimal points to insert function calls, which would reduce the perplexity of the generated sentence [00:08:21]. This method required minimal human-labeled examples [00:09:00].

General Function Calling (OpenAI, June 2023)

In June 2023, OpenAI launched general function calling, where models were post-trained to inherently use tools [00:09:23]. This eliminated the need for extensive example-giving, as models could now call functions based on a given syntax [00:09:33].

Core Concepts and Workflow

Function calling serves two primary purposes:

  1. Fetching Data: Reading APIs, retrieval, memory [00:10:27].
  2. Taking Action: Writing to APIs, managing application state (UI, backend), and workflow actions (multi-step processes, meta-actions like changing prompts or handing off conversations) [00:10:29].

The workflow involves:

  1. Defining the functions the model can use [00:11:03].
  2. Providing user input [00:11:09].
  3. The model suggesting a function call based on intent; it does not execute the function itself [00:11:12].
  4. The developer is responsible for parsing the suggested call, executing the corresponding code, and providing the result back to the model [00:11:24].
  5. The model then uses this result to continue its generation [00:11:32].

Best Practices for Function Definition

  • Write clear descriptions for functions and parameters [00:11:51].
  • Apply software engineering best practices, ensuring functions are intuitive and follow the principle of least privilege [00:11:53].
  • Use enums and object structures to prevent the model from making invalid calls or representing invalid states [00:12:48].

Functions vs. Tools

In OpenAI’s adopted definition, “functions” refer to the raw function calling mechanism where developers provide an interface and are responsible for execution [01:14:06]. “Tools” are a superset of functions, encompassing functions as well as hosted solutions like code interpreters or file search [01:14:16].

Advanced Use Cases and Implementation Patterns

The core mechanism of function calling enables complex behaviors:

  • Agents as Loops: AI agents are fundamentally loops where the model continuously specifies tools, makes calls, processes results, and appends them to the conversation history until no more tool calls are needed [00:20:17].
  • Memory: Simple memory can be implemented by storing information in a list or by reading/writing to a local JSON file via functions [00:26:27]. More advanced memory systems can employ smart querying, retrieval augmented generation (RAG), or semantic similarity to load relevant memories [00:32:04]. Consistency in stored memories can be managed by performing retrieval before storing, checking for semantic similarities, and using a model to identify and resolve contradictions or updates [01:32:04].
  • Delegation: Models can delegate tasks using:
    • Handoffs: Transferring a conversation entirely to a different agent by replacing the system prompt and tools [00:34:27]. This is useful for routing to specialized agents with specific function sets [01:08:01].
    • Nested Calls: One function calling another, often overlooked but straightforward to implement [00:34:37].
    • Manager Tasks: Delegating a complex task to a “smarter” model (e.g., GPT-4) via an API call within a function [00:34:50].
  • Asynchronous Operations: For long-running tasks like network calls, agents can initiate tasks asynchronously and continue interacting with the user while the task runs in the background [00:40:05]. This involves using asyncio to create non-blocking operations and a mechanism to check on task progress [00:52:53]. The Real-time API inherently supports asynchronous functions, allowing models to call functions and continue the conversation without waiting for an immediate response [01:39:55].
  • Self-Modifying/Self-Bootstrapping Agents: By combining function calling with the ability to execute generated code, an agent can dynamically create and add new functions to its own toolset [01:17:16]. This allows for runtime adaptation and expansion of capabilities, such as generating a calculator function on the fly [01:25:54].

Challenges and Considerations

  • Scaling Functions: When dealing with dozens or hundreds of functions, strategies include:
    • Using multiple agents, each with a specialized set of functions, and employing a triage mechanism to hand off conversations to the appropriate agent [01:07:53].
    • Fine-tuning smaller models with a large number of functions [01:08:30].
    • Dynamic function loading, where only the most relevant functions are loaded into context based on input or conversation [01:08:51].
    • A general rule of thumb suggests that models perform reliably with around 10-20 functions without extensive prompting, though fine-tuning can extend this significantly [01:16:17].
  • Function Calls within Thought Text: For current OpenAI API models, function calls typically occur at the very end of the generation, and the internal “thought process” is not exposed [01:09:36].
  • Real-time API Tricks: Real-time API models can be guided to exhibit specific behaviors by describing a “stay silent” function to control when the model waits for user input [01:27:44]. They can also be instructed to read text according to XML tags, even without explicit training for this behavior, as an unintended consequence of their general language understanding [01:29:12].