From: aidotengineer

Function calling is a core concept in AI and language models, allowing models to interact with external systems and perform actions [00:00:19]. It enables language models to generate structured data that can be used to invoke functions, fetch information, or trigger specific actions [00:00:27].

A Brief History of Function Calling

The evolution of language models and their ability to perform actions has progressed through several stages:

  • Text Completion Initially, models like GPT, GPT2, and GPT3 were “base models” that would simply continue input text [01:52:00]. Getting them to follow specific instructions was challenging [02:14:00].
  • Instruction Following With InstructGPT, models were introduced that could actually perform actions based on given input, rather than just completing sentences [02:55:00].
  • Roles and Personas The introduction of “users,” “assistants,” and distinct “roles” through post-training allowed models to gain specific personas [03:08:00].
  • External Tool Interaction Eventually, the notion of providing models with additional tools to interact with external states emerged [03:17:00].

Early implementations of function calling include:

  • WebGPT (2021): This version of GPT3 was specifically trained to use a defined set of functions for web search, enabling it to generate actions rather than just text [03:40:00]. The model learned to imitate user behavior and produce preferred responses by observing users perform searches [07:42:00].
  • Meta’s Any-Tool Approach: Meta developed a method to teach models to use any tools, such as calculators or translation services, by analyzing log probabilities to retroactively insert function calls [08:07:00]. This approach required minimal human-labeled examples [09:02:00].
  • OpenAI’s General Function Calling (June 2023): OpenAI launched a generalized function calling capability, where models are pre-trained (or post-trained) to use tools based on a defined syntax, removing the need for extensive example-giving by developers [09:23:00].

The speaker argues that function calling is fundamental to “all the exciting stuff that’s happening today” in AI, being a “powerful” mechanism [09:50:00].

Core Concepts of Function Calling

Function calling serves two main purposes [01:22:00]:

  1. Fetching Data: This includes reading APIs, retrieval processes, and accessing memory [01:27:00].
  2. Taking Action: This involves writing to APIs, managing application state (UI, frontend, backend), and executing workflow actions (multi-step processes, or meta-actions like switching prompts or handing off conversations) [01:30:00].

The process of function calling generally follows these steps [01:54:00]:

  1. You inform the model which functions it can use [01:03:00].
  2. The model communicates its intent to use a function and its arguments, but it does not execute the function itself [01:14:00].
  3. You are responsible for parsing the model’s intent, executing the corresponding code, and providing the result back to the model [01:24:00].
  4. The model then uses this result in its subsequent generation [01:34:00].

[!NOTE] A distinction is made between “functions” and “tools”: “Functions” refer to the raw function calling interface where you provide an interface and are responsible for execution. “Tools” is a superset that includes functions, along with hosted solutions like code interpreters or file search [01:36:16].

Best Practices for Function Calling

Adhering to software engineering best practices is crucial when defining functions for AI models [01:54:00]:

  • Clear Function Descriptions: Explain the purpose of each parameter clearly [02:18:00].
  • Intuitive Design: Functions should be intuitive and follow the principle of least astonishment [02:31:00]. If a human struggles to understand it, the model likely will too [02:38:00].
  • Structured Parameters: Use enums and object structures to prevent the model from making invalid calls or representing invalid states [02:48:00].

Function Calling in AI Agents

AI agents are fundamentally “loops” [02:22:00]. A basic agent operates by:

  1. Specifying tools to the model [02:26:00].
  2. Calling the model and receiving its message [02:30:00].
  3. Handling any tool calls suggested by the model [02:33:00].
  4. Appending the results back into the conversation history [02:33:00].
  5. Continuing this loop until no more tool calls are made [02:37:00].

This simple loop allows agents to perform multi-step operations and interact with external systems.

Memory Management using Function Calling

Memory can be implemented using functions, often as simply as managing a list of stored information [02:27:00]. For instance, add_to_memory and get_memory functions can be exposed to the model [02:29:00]. The description of these functions can guide the model on when to use them, e.g., when the user provides factual information about themselves [02:33:00].

A simple demonstration involves:

  1. Defining add_to_memory and get_memory functions.
  2. Persisting this memory to a local JSON file that is read at the beginning and written out after each interaction [02:45:00].
  3. When the user provides personal information, the model calls add_to_memory.
  4. Later, when asked about that information, the model calls get_memory to retrieve it [03:13:00].

More advanced memory systems can involve:

  • Smart Querying: Instead of loading all memory, use techniques like semantic similarity or search (e.g., embeddings) to retrieve only relevant pieces [03:12:00].
  • Consistency Enforcement: When storing new memory, perform a retrieval to find similar memories and use the model to explicitly check for updates or contradictions [03:32:00]. Timestamps can also help in managing conflicting information [03:32:00].

Delegation and Asynchrony

Delegation in AI agents involves tasks being passed to other models or processes [03:05:00]. Forms of delegation include [03:22:00]:

  • Handoffs: Transferring a conversation entirely to a different agent, often by replacing the system prompt and tools [03:27:00].
  • Nested Calls: One function call leads to another, often the easiest to implement [03:39:00].
  • Manager Tasks: A more asynchronous approach where a manager agent oversees sub-tasks [03:43:00].

Asynchronous function calls are crucial for improving user experience, preventing the user from waiting for long-running tasks [03:53:00]. By running tasks in parallel, the total waiting time can be significantly reduced [05:35:00].

To implement asynchronous delegation:

  1. Separate I/O: Separate user input handling from message processing, often using websockets or similar mechanisms [04:17:00].
  2. Parallel Execution: Use asyncio to run multiple tool calls (e.g., network calls to other models) concurrently [04:32:00].
  3. Task Management: Implement create_task and check_task functions. The model calls create_task, receives a task ID, and can then query the status of that task asynchronously without blocking the main conversation [05:40:00]. This allows the user to continue interacting with the model while background tasks are being processed [05:58:00].

Dynamic Function Generation (Bootstrapping)

A “super unsafe” but illustrative demonstration shows an AI agent writing and then using its own functions [01:14:00]. This involves:

  1. Defining an add_tool function that takes a Python string representing a function’s implementation.
  2. Using exec() (caution advised due to security risks) to evaluate the string and make the function available to the agent [01:21:31].
  3. The agent can then be prompted to “make yourself a little calculator” or other tools, which it generates and incorporates into its capabilities [01:59:00].

Real-time API and Function Calling

Function calling plays a significant role in real-time AI applications, particularly with voice models:

  • “Stay Silent” Function: To prevent the model from interrupting too early in a real-time conversation, a “stay silent” function can be implemented [02:46:00]. The model can be prompted to call this function when the user might not be done talking, allowing for natural pauses without premature responses [02:50:00].
  • Structured Speech Output: Models can be prompted to follow specific XML tags to output speech in a particular way (e.g., for pauses or tone changes), even if not explicitly trained for it. This allows for more controlled real-time audio generation [02:57:00].
  • Native Asynchronous Functions: Real-time APIs often natively support asynchronous functions, allowing the model to call a function, receive no immediate response, and continue the conversation until the function’s result is ready [03:52:00]. This contrasts with chat conversations where the flow can be halted.

Addressing Advanced Use Cases

Several advanced scenarios and questions related to function calling were discussed:

  • Managing Dozens or Hundreds of Functions:
    • Multi-Agent Architecture: Divide responsibilities among multiple agents, each with a cluster of related functions [01:57:00]. This creates a “glorified triage” system [01:19:00].
    • Fine-tuning: For specific and latency-sensitive cases, fine-tuning smaller models with a large number of functions (e.g., 120 functions with GPT 3.5) can be effective [01:30:00].
    • Dynamic Function Loading: Based on input or conversation, load only the most relevant functions into context, potentially using embeddings or a two-step function call approach [01:51:00].
  • Router Patterns: Implementing router patterns can involve having a “triage” agent that decides which specialized agent (e.g., email agent, calendar agent) should handle the user’s request, effectively routing the conversation and potentially performing a handoff [01:10:00].
  • Generators for Nested Calls: It is possible and often the “right way” to implement agents as generators to yield results at each step, allowing for better tracking of progress and events from multiple concurrent agents [01:03:00].
  • Tool Call Limits: There are generally no hard limits on the number of functions or parallel function calls, but practical performance often limits effective tool libraries to 10-20 functions without extensive fine-tuning or complex routing [01:05:00].
  • Vision Models and Tools: For vision models, function calls currently occur at the very end of the model’s processing and are not exposed within its internal “thought” process or chain of thought [01:09:00].

Conclusion

The speaker emphasizes that function calling is a remarkably powerful and versatile mechanism within AI agent development, capable of handling complex scenarios like memory, delegation, and asynchronous operations with relatively simple, first-principles implementations [02:40:00]. While frameworks exist to abstract some complexity, granular control can often be achieved with minimal lines of code [01:06:00]. The ability for an AI to even write and integrate its own tools highlights the profound impact of function calling on AI capabilities [01:52:00].