From: aidotengineer
Introduction to Function Calling
Function calling is a core concept in AI and language models that allows models to interact with external states and perform actions beyond text generation [00:22:00]. It enables language models to fetch data, take actions, manage application state, and execute workflow actions [01:27:00].
Evolution of Tool Use in AI
The progression of language models demonstrates an evolution of AI engineering and tools towards better integration with external functionalities:
- Text Completion (Original GPT models): Early models like GPT, GPT2, and GPT3 were base models that would continue input text [01:53:00]. Getting them to follow specific instructions was difficult [02:14:00].
- Instruction Following (InstructGPT): OpenAI introduced instruction following, allowing models to perform actions based on given input rather than just completing text [02:52:00]. This also led to the introduction of user and assistant roles [03:03:00].
- Early Tool Integration (WebGPT): Around 2021, WebGPT, a version of GPT3, was trained to use a specific set of functions for web search [03:38:00]. This marked an early instance of models generating actions that were then parsed and used to interact with external states [07:05:00].
- Any Tool Use (Meta): Meta developed a method to teach models how to use any tools, leveraging log probability analysis to retroactively insert function calls where they would reduce sentence perplexity [08:05:00]. This approach minimized the need for human-labeled examples [09:00:00].
- General Function Calling (OpenAI, June 2023): OpenAI launched general function calling, where models were post-trained to use tools without requiring extensive examples [09:23:00]. This allows models to call functions based on a defined syntax [09:37:00].
Purposes of Function Calling
Function calling serves two main purposes [01:22:00]:
- Fetching Data: This includes reading APIs, retrieval, and accessing memory [01:27:00].
- Taking Action: This involves using APIs to write, managing application state (UI, frontend, backend), and performing workflow actions (multi-step processes, or even meta-actions like switching prompts or handing off conversations) [01:28:00].
How Function Calling Works
The process involves several steps [01:56:00]:
- Define Functions: Inform the model about the available functions it can use [01:03:00].
- Model Suggests: The model indicates its intent to use a function [01:12:00]. It does not execute the function itself [01:14:00].
- User Executes: The developer is responsible for parsing the model’s intent, executing the code, and processing the result [01:24:00].
- Provide Result: The result of the function execution is provided back to the model, which can then use it in its generation [01:29:00].
Best Practices for Writing Functions
When writing functions for AI models, best practices for building AI systems include [01:47:00]:
- Clarity: Write clear function descriptions and explain the purpose of each parameter [01:51:00]. Use a system prompt and include examples [02:20:00].
- Software Engineering Principles: Apply established software engineering best practices [01:53:00]. Functions should be intuitive and follow the principle of least authority [02:31:00].
- Validation: Use enums and object structures to prevent the model from making invalid calls or representing invalid states [02:48:00].
Functions vs. Tools
The distinction between “functions” and “tools” is subtle [01:44:00]:
- Functions: Refers to the raw function calling mechanism, where the developer provides an interface and is responsible for executing the code [01:49:00].
- Tools: A superset that includes functions, but also encompasses hosted solutions like code interpreter or file search [01:58:00].
Implementing AI Agents
Developing and optimizing AI agents often involves structuring them as loops that continuously interact with the model and handle tool calls [02:18:00].
A basic agent loop typically [02:20:00]:
- Specifies the available tools.
- Calls the language model.
- Receives and prints the message.
- Handles any tool calls suggested by the model.
- Appends the results back to the conversation.
- Continues looping until no more tool calls are suggested [02:30:00].
One useful utility is functions_to_schema
, which converts a raw Python function object into the correct schema for the model [02:59:00].
Memory Management
Implementing memory for AI agents can range from simple to complex:
- Basic Memory: A simple list can serve as memory, storing factual information about the user or their preferences. This can be stored in a local JSON file, read at the beginning, and written out after each turn [02:26:00].
- Advanced Memory (RAG): For more sophisticated memory, one can implement smart querying and retrieval augmented generation (RAG) workflows [02:08:00]. Instead of loading all memory, use semantic similarity or search to retrieve only relevant information [02:12:00].
- Ensuring Consistency: To enforce consistency and resolve contradictions in stored memories, one approach is to perform a retrieval for similar memories before storing new information [03:13:00]. An explicit check with the model can determine if the new memory updates or contradicts existing ones [03:21:00]. Using timestamps and creating explicit chains of updates can help track memory evolution and allow for presenting the latest or full history [03:31:00].
Delegation and Multi-Agent Patterns
Delegation involves having one agent or function pass a task to another, more specialized agent or function. This is a key aspect of developing and optimizing AI agents.
- Nested Calls: One function can directly call another AI model for a harder task (e.g., delegating to a “smarter model”) [03:59:00].
- Handoffs: This involves completely swapping the conversation and tools to a different agent, which is essentially replacing the system prompt and tools [03:27:00].
- Manager Tasks: More asynchronous forms of delegation can involve a manager agent overseeing and assigning tasks [03:43:00].
Routing Patterns for Many Functions
When dealing with dozens or hundreds of functions, scaling AI models and their impact on development tools and efficient tool calling becomes crucial [01:52:00].
- Multiple Agents: Divide responsibilities by creating multiple agents, each with a focused set of related functions (e.g., an “email agent” and a “calendar agent”) [01:07:53]. A primary “triage” agent can then use special “transfer” functions to hand off the conversation to the appropriate specialized agent [01:12:57]. This is a common primary use case for agents and handoffs [01:15:20].
- Fine-tuning: For highly latency-sensitive cases with many functions (e.g., 120 functions with GPT-3.5), fine-tuning smaller models can improve performance [01:08:30].
- Dynamic Function Loading: Based on user input or conversation context, dynamically load only the most relevant functions into memory or context. This can be achieved with embeddings or a two-step function call, where a function loads more functions, essentially acting as a handoff [01:08:51].
- Rule of Thumb: While fine-tuning allows for more, generally, it’s advised not to exceed 10-20 functions with a single agent without extensive prompting or specific evaluations [01:16:17].
Asynchronous Operations
Asynchronous operations are essential for maintaining responsiveness, especially when dealing with network calls or potentially long-running tasks. This is crucial for integrating AI coding agents with third-party tools.
- Necessity: In a blocking loop, long-running tasks (like multiple API calls) will cause the entire interaction to freeze [03:52:00]. Asynchronous handling allows other operations to continue while a task runs in the background [04:17:00].
- Implementation: Using
asyncio
in Python allows for non-blocking operations. Functions can be set up toawait
results, but the overall system can continue processing other inputs [04:22:00]. - Delegated Tasks: A pattern involves creating a task with a unique ID, returning a “response pending” status, and then having a separate function (
check_tasks
) to query the status of running tasks [05:27:00]. This enables the model to continue interacting with the user while tasks complete in the background [05:59:00]. - Real-time API: OpenAI’s real-time API inherently supports asynchronous functions. The model can call a function, get no immediate response, and continue the conversation until the function’s response is ready, at which point it’s seamlessly injected [01:39:50].
Self-Modifying Agents (Generating Tools)
A powerful and experimental concept is an agent that can write its own functions [01:15:30].
- Mechanism: A function can be created that takes a string representation of a Python function’s implementation. Using Python’s
exec
(with caution due to security risks) can interpret this string and add the newly defined function to the agent’s available tools [01:19:00]. - Example: An agent could be prompted to “make yourself a little calculator” and then proceed to define and use a
calculate
function based on its own code generation [01:25:59]. This capability demonstrates the dynamic and extensible nature of AI coding agents.
Real-time API Specifics
When building real-time AI interactions (e.g., voice assistants), specific function calling techniques can enhance user experience:
- “Stay Silent” Function: To prevent the model from interrupting a user who is merely pausing, a
stay_silent
function can be used. This function, triggered by voice activity detection (VAD), allows the model to verify if the user is truly done talking before responding [01:27:12]. The model can be prompted to call this function when the user “is not quite done talking” [01:28:09]. - XML Tag Guidance: Although not strictly function calling, models can be guided to speak in specific ways (e.g., controlling tone or pacing) by embedding instructions within XML tags in the generated script [01:29:12]. This is a behavioral consequence of training rather than explicit function execution [01:30:27].
User Feedback and Iteration
The interactive nature of user feedback and AI development is highlighted throughout the development process, allowing for real-time adjustments and exploration of different implementations [00:51:00]. Prototyping tools like Swarm or Pydantic AI can be useful, but for granular control and lightweight implementations, developers often write their own loops [01:06:14].