Developing custom AI tools and functions

Introduction to Function Calling

Function calling is a core concept in AI and language models that allows models to interact with external states and perform actions beyond text generation [00:22:00]. It enables language models to fetch data, take actions, manage application state, and execute workflow actions [01:27:00].

Evolution of Tool Use in AI

The progression of language models demonstrates an evolution of AI engineering and tools towards better integration with external functionalities:

Text Completion (Original GPT models): Early models like GPT, GPT2, and GPT3 were base models that would continue input text [01:53:00]. Getting them to follow specific instructions was difficult [02:14:00].
Instruction Following (InstructGPT): OpenAI introduced instruction following, allowing models to perform actions based on given input rather than just completing text [02:52:00]. This also led to the introduction of user and assistant roles [03:03:00].
Early Tool Integration (WebGPT): Around 2021, WebGPT, a version of GPT3, was trained to use a specific set of functions for web search [03:38:00]. This marked an early instance of models generating actions that were then parsed and used to interact with external states [07:05:00].
Any Tool Use (Meta): Meta developed a method to teach models how to use any tools, leveraging log probability analysis to retroactively insert function calls where they would reduce sentence perplexity [08:05:00]. This approach minimized the need for human-labeled examples [09:00:00].
General Function Calling (OpenAI, June 2023): OpenAI launched general function calling, where models were post-trained to use tools without requiring extensive examples [09:23:00]. This allows models to call functions based on a defined syntax [09:37:00].

Purposes of Function Calling

Function calling serves two main purposes [01:22:00]:

Fetching Data: This includes reading APIs, retrieval, and accessing memory [01:27:00].
Taking Action: This involves using APIs to write, managing application state (UI, frontend, backend), and performing workflow actions (multi-step processes, or even meta-actions like switching prompts or handing off conversations) [01:28:00].

How Function Calling Works

The process involves several steps [01:56:00]:

Define Functions: Inform the model about the available functions it can use [01:03:00].
Model Suggests: The model indicates its intent to use a function [01:12:00]. It does not execute the function itself [01:14:00].
User Executes: The developer is responsible for parsing the model’s intent, executing the code, and processing the result [01:24:00].
Provide Result: The result of the function execution is provided back to the model, which can then use it in its generation [01:29:00].

Best Practices for Writing Functions

When writing functions for AI models, best practices for building AI systems include [01:47:00]:

Clarity: Write clear function descriptions and explain the purpose of each parameter [01:51:00]. Use a system prompt and include examples [02:20:00].
Software Engineering Principles: Apply established software engineering best practices [01:53:00]. Functions should be intuitive and follow the principle of least authority [02:31:00].
Validation: Use enums and object structures to prevent the model from making invalid calls or representing invalid states [02:48:00].

Functions vs. Tools

The distinction between “functions” and “tools” is subtle [01:44:00]:

Functions: Refers to the raw function calling mechanism, where the developer provides an interface and is responsible for executing the code [01:49:00].
Tools: A superset that includes functions, but also encompasses hosted solutions like code interpreter or file search [01:58:00].

Implementing AI Agents

Developing and optimizing AI agents often involves structuring them as loops that continuously interact with the model and handle tool calls [02:18:00].

A basic agent loop typically [02:20:00]:

Specifies the available tools.
Calls the language model.
Receives and prints the message.
Handles any tool calls suggested by the model.
Appends the results back to the conversation.
Continues looping until no more tool calls are suggested [02:30:00].

One useful utility is functions_to_schema, which converts a raw Python function object into the correct schema for the model [02:59:00].

Memory Management

Implementing memory for AI agents can range from simple to complex:

Basic Memory: A simple list can serve as memory, storing factual information about the user or their preferences. This can be stored in a local JSON file, read at the beginning, and written out after each turn [02:26:00].
Advanced Memory (RAG): For more sophisticated memory, one can implement smart querying and retrieval augmented generation (RAG) workflows [02:08:00]. Instead of loading all memory, use semantic similarity or search to retrieve only relevant information [02:12:00].
Ensuring Consistency: To enforce consistency and resolve contradictions in stored memories, one approach is to perform a retrieval for similar memories before storing new information [03:13:00]. An explicit check with the model can determine if the new memory updates or contradicts existing ones [03:21:00]. Using timestamps and creating explicit chains of updates can help track memory evolution and allow for presenting the latest or full history [03:31:00].

Delegation and Multi-Agent Patterns

Delegation involves having one agent or function pass a task to another, more specialized agent or function. This is a key aspect of developing and optimizing AI agents.

Nested Calls: One function can directly call another AI model for a harder task (e.g., delegating to a “smarter model”) [03:59:00].
Handoffs: This involves completely swapping the conversation and tools to a different agent, which is essentially replacing the system prompt and tools [03:27:00].
Manager Tasks: More asynchronous forms of delegation can involve a manager agent overseeing and assigning tasks [03:43:00].

Routing Patterns for Many Functions

When dealing with dozens or hundreds of functions, scaling AI models and their impact on development tools and efficient tool calling becomes crucial [01:52:00].

Multiple Agents: Divide responsibilities by creating multiple agents, each with a focused set of related functions (e.g., an “email agent” and a “calendar agent”) [01:07:53]. A primary “triage” agent can then use special “transfer” functions to hand off the conversation to the appropriate specialized agent [01:12:57]. This is a common primary use case for agents and handoffs [01:15:20].
Fine-tuning: For highly latency-sensitive cases with many functions (e.g., 120 functions with GPT-3.5), fine-tuning smaller models can improve performance [01:08:30].
Dynamic Function Loading: Based on user input or conversation context, dynamically load only the most relevant functions into memory or context. This can be achieved with embeddings or a two-step function call, where a function loads more functions, essentially acting as a handoff [01:08:51].
Rule of Thumb: While fine-tuning allows for more, generally, it’s advised not to exceed 10-20 functions with a single agent without extensive prompting or specific evaluations [01:16:17].

Asynchronous Operations

Asynchronous operations are essential for maintaining responsiveness, especially when dealing with network calls or potentially long-running tasks. This is crucial for integrating AI coding agents with third-party tools.

Necessity: In a blocking loop, long-running tasks (like multiple API calls) will cause the entire interaction to freeze [03:52:00]. Asynchronous handling allows other operations to continue while a task runs in the background [04:17:00].
Implementation: Using asyncio in Python allows for non-blocking operations. Functions can be set up to await results, but the overall system can continue processing other inputs [04:22:00].
Delegated Tasks: A pattern involves creating a task with a unique ID, returning a “response pending” status, and then having a separate function (check_tasks) to query the status of running tasks [05:27:00]. This enables the model to continue interacting with the user while tasks complete in the background [05:59:00].
Real-time API: OpenAI’s real-time API inherently supports asynchronous functions. The model can call a function, get no immediate response, and continue the conversation until the function’s response is ready, at which point it’s seamlessly injected [01:39:50].

Self-Modifying Agents (Generating Tools)

A powerful and experimental concept is an agent that can write its own functions [01:15:30].

Mechanism: A function can be created that takes a string representation of a Python function’s implementation. Using Python’s exec (with caution due to security risks) can interpret this string and add the newly defined function to the agent’s available tools [01:19:00].
Example: An agent could be prompted to “make yourself a little calculator” and then proceed to define and use a calculate function based on its own code generation [01:25:59]. This capability demonstrates the dynamic and extensible nature of AI coding agents.

Real-time API Specifics

When building real-time AI interactions (e.g., voice assistants), specific function calling techniques can enhance user experience:

“Stay Silent” Function: To prevent the model from interrupting a user who is merely pausing, a stay_silent function can be used. This function, triggered by voice activity detection (VAD), allows the model to verify if the user is truly done talking before responding [01:27:12]. The model can be prompted to call this function when the user “is not quite done talking” [01:28:09].
XML Tag Guidance: Although not strictly function calling, models can be guided to speak in specific ways (e.g., controlling tone or pacing) by embedding instructions within XML tags in the generated script [01:29:12]. This is a behavioral consequence of training rather than explicit function execution [01:30:27].

User Feedback and Iteration

The interactive nature of user feedback and AI development is highlighted throughout the development process, allowing for real-time adjustments and exploration of different implementations [00:51:00]. Prototyping tools like Swarm or Pydantic AI can be useful, but for granular control and lightweight implementations, developers often write their own loops [01:06:14].

Tubegraph

Explorer

Table of Contents