From: aidotengineer
Function calling is a core concept in AI and language models, enabling them to interact with external systems and perform actions beyond text generation [00:19:19]. It allows models to fetch data, take action, manage application state, and execute workflow actions [01:09:55].
Historical Evolution of Function Calling
The evolution of patterns and abstractions with language models includes:
- Text Completion Initially, models like the original GPT, GPT-2, and GPT-3 were base models that would continue an input text [01:52:03]. Getting these models to follow instructions was challenging [02:12:31].
- Instruction Following OpenAI introduced instruction following with InstructGPT, allowing models to perform specific tasks as requested [02:52:16].
- Roles (Users/Assistants) The notion of users, assistants, and roles was introduced through post-training, giving models distinct personas [03:02:40].
- External Tools The ability to provide additional tools to interact with external states was finally introduced [03:12:40]. This marked a shift from merely generating text to generating actions [07:05:07].
Early instances of tool use include:
- WebGPT (2021) OpenAI trained a version of GPT-3 called WebGPT to use a specific set of functions for web search [06:44:03]. This involved generating actions, parsing them, and reintroducing results into the model’s context [07:08:06].
- Meta’s Tool-Using Models Meta developed a method to teach models to use any tools, such as calculators, QA systems, and translators, by analyzing log probabilities to determine where to insert function calls [08:07:04]. This technique allowed models to learn tool use with minimal human-labeled examples [09:00:16].
- General Function Calling In June 2023, OpenAI launched general function calling, where models are post-trained to use tools by understanding a specific syntax [09:22:56].
Fundamentally, functions are considered powerful enough for most modern AI applications [09:59:58]. This aligns with the importance of tool calling in AI.
How Function Calling Works: A Crash Course
Function calling primarily serves two purposes:
- Fetching Data: Reading APIs, retrieval, memory [01:27:14].
- Taking Action: Writing to APIs, managing application state (UI, front end, back end), and workflow actions [01:29:16].
The process of implementing function calling in AI involves several steps:
- Define Functions: Inform the model about available functions and their schemas [01:17:46].
- User Input: Provide the user’s input to the model [01:19:10].
- Model Intent: The model indicates its intent to call a specific function with arguments [01:19:10].
- Execution (Developer Responsibility): The model does not execute the function itself. The developer is responsible for parsing the model’s intent, executing the actual code, and handling the results [01:19:54].
- Provide Result: The result of the function execution is provided back to the model, which can then use it in its subsequent generation [01:29:16].
Best Practices for Function Calling
Based on documentation, several best practices for developing custom AI tools and functions are recommended:
- Clear Functions: Write clear function descriptions and explain the purpose of each parameter [01:51:30].
- System Prompt & Examples: Use a system prompt and include examples for better model understanding [02:20:16].
- Software Engineering Principles: Apply good software engineering practices to make functions intuitive and follow the “function of least principle” [02:27:26].
- Enums and Object Structure: Use enums and object structures to prevent the model from making invalid calls or representing invalid states [02:49:15].
Applications of Function Calling
Agents and Loops
At its simplest, an agent is a loop [02:20:20]. This concept is foundational to agent continuations for AI workflows. The loop involves:
- Specifying tools to the model [02:26:07].
- Calling the model to get a message [02:26:07].
- Printing the message [02:26:07].
- Handling any tool calls [02:26:07].
- Appending the tool results [02:26:07].
- Breaking the loop when no more tool calls are needed [02:26:07].
Memory
A basic form of memory can be implemented using a simple list [02:27:26]. For example, functions add_to_memory
and get_memory
can be defined [02:27:26]. To persist memory, it can be stored in a local JSON file, read at the beginning, and written out after each turn [02:52:16]. This allows the model to recall previous interactions across sessions [03:00:54].
To enforce consistency or handle contradictions in stored memories, one approach is to:
- Perform a retrieval (search) for semantically similar memories before storing new ones [01:32:11].
- Use a model to explicitly check if a new memory updates or contradicts an existing one [01:32:11].
- Represent updates as a chain of nodes, allowing the system to surface only the latest information or the entire history [01:33:00].
Delegation
Function calling enables delegation of tasks to different models or agents. Forms of delegation include:
- Handoffs: Transferring a conversation entirely to a different agent by replacing the system prompt and tools [03:27:32].
- Nested Calls: An agent can call another function that, in turn, makes an API request to a “smarter model” for a harder task [03:39:57].
- Manager Tasks: More asynchronous delegation, where a manager agent oversees tasks [03:45:57].
Asynchronous Operations
Handling tasks asynchronously is crucial for maintaining a responsive user experience. While Python is single-threaded, asyncio
can be used to manage non-blocking operations like network calls in parallel [04:36:37].
The general approach for asynchronous function calls:
- Define a
create_task
function that initiates a task (e.g., an API call) and returns a unique task ID [04:47:04]. - The model can then continue the conversation while the task runs in the background.
- A
check_tasks
function can be used to query the status of pending tasks and retrieve results once they are completed [04:47:04].
In the real-time API, asynchronous functions are supported natively. The model can call a function, receive no immediate response, and the conversation can continue until the function’s result is available [01:39:55].
Self-Modifying/Self-Bootstrapping Agents
A powerful application of function calling is enabling an agent to write and use its own tools during runtime. This can be achieved by having the model generate Python code for a new function, which is then dynamically evaluated (exec
) and added to the agent’s available tools [01:21:31]. This concept aligns with custom model building and code evaluation.
Security Warning
Using
exec
oreval
with arbitrary model-generated code is highly dangerous and should be avoided in production environments due to severe security risks [01:26:37].
Routing and Multi-Agent Patterns
When dealing with dozens or hundreds of functions, effective tool calling can be achieved through multi-agent patterns and routing:
- Multiple Agents: Split responsibilities by having multiple agents, each with a specific grouping of functions [01:07:53].
- Triage Agent: A primary “triage” agent can decide which specialized agent (e.g., an “email agent” or “calendar agent”) to hand off the conversation to [01:13:00].
- Dynamic Function Loading: Based on user input or conversation context, dynamically load the most likely relevant functions into memory or context. This can involve embeddings or a two-step function call process [01:08:51].
- Fine-tuning: For specific, latency-sensitive cases, models can be fine-tuned with hundreds of functions [01:16:36]. However, a general rule of thumb suggests models perform well with 10-20 tools without extensive prompting [01:16:20].
Real-time API Tricks
Function calling can also enhance real-time interactions, although these are not strictly function calls in the API:
- “Stay Silent” Function: In real-time voice interactions, a model can use a “stay silent” function to decide if the user is truly done talking, preventing it from cutting off the user prematurely [01:27:44]. This provides a more intelligent voice activity detection than simple triggers [01:27:44].
- XML Tag Guidance: Models can be prompted to follow specific speaking styles or scripts using XML tags, even if they were not explicitly trained for it. This behavior is an emergent property of language understanding [01:29:12].
Testing and Evaluation
The process of testing and evaluation of AI models for function calling often involves live coding, debugging, and iterative refinement. Simple print statements can be used to observe the model’s function calls and responses [01:17:04].