From: aidotengineer

Realtime AI-driven fraud detection techniques and managing asynchronous experiences in chatbots are enabled by robust asynchronous function handling, particularly within the context of the OpenAI Realtime API. This allows models to perform actions in the background without interrupting the user’s conversational flow [00:50:02].

Evolution of AI Model Interaction

The ability of large language models (LLMs) to interact with external tools and manage asynchronous operations has evolved significantly [01:16:05]:

  • Text Completion Initially, models like GPT-1, GPT-2, and GPT-3 were base models that simply continued input text [01:53:00]. Getting them to follow instructions was difficult, often requiring “few-shot” examples within the prompt [02:21:00].
  • Instruction Following With InstructGPT, models could be given input and would perform actions as requested [02:52:00]. This introduced roles like “users” and “assistants” through post-training [03:02:00].
  • Tool/Function Calling Eventually, models were equipped with the ability to use external tools to interact with outside systems [03:12:00]. Early examples include WebGPT (2021), a GPT-3 version trained for web search, which generated actions that were then parsed and fed back into the model’s context [06:44:00]. Meta also developed methods to teach models to use any tools, such as calculators or translation services, by analyzing log probabilities to insert function calls optimally [08:00:00]. OpenAI launched general function calling in June 2023, where models are pre-trained to use specific tool syntax [09:23:00].

Function Calling Fundamentals

Function calling serves two main purposes: fetching data (e.g., from APIs, retrieval, memory) and taking action (e.g., writing to APIs, managing application state, performing multi-step workflow actions) [10:22:00].

The process of function calling involves:

  1. Model receives tools and user input. You define which functions the model can use [11:03:00].
  2. Model outputs intent. The model suggests which function to call and with what arguments [11:12:00].
  3. Host system executes function. The host system (your code) is responsible for parsing the model’s intent, executing the actual function code, and handling the result [11:15:00]. The model itself does not execute the function [11:17:00].
  4. Result returned to model. The result is provided back to the model for its next generation [11:29:00].

Best Practices for Functions

  • Clear Descriptions: Write clear descriptions for functions and their parameters [11:51:00].
  • Software Engineering Principles: Apply best practices:
    • Functions should be intuitive [12:31:00].
    • Follow the “function of least principle” [12:33:00].
    • Use enums and object structures to prevent the model from making invalid calls [12:48:00].

Asynchronous Function Handling

A common challenge in building AI applications is managing operations that take time, such as API calls or complex computations [03:40:00]. If these operations are synchronous, they block the model’s interaction, leading to a poor user experience [03:49:00].

The Problem with Blocking Operations

In a synchronous loop, if the model needs to call multiple functions that each take a significant amount of time (e.g., 10 seconds), the total waiting time for the user will be the sum of those times [00:52:37].

Introducing Asynchrony with asyncio

Python’s asyncio library allows for concurrent execution of operations, particularly effective for I/O-bound tasks like network calls [00:53:37]. By using asyncio.sleep instead of time.sleep, multiple tasks can be scheduled to run in parallel, reducing the total waiting time significantly [00:53:01].

Managing Background Tasks: The “Task” Pattern

Even with asyncio, the model might still wait for all parallel tasks to complete before responding, which is undesirable for interactive conversations [00:54:06]. A more advanced pattern involves:

  1. create_task function: The model can call a create_task function, which spins off a long-running operation (e.g., a complex model call) in the background and immediately returns a unique task ID [00:54:53]. This allows the conversation to continue [00:57:59].
  2. check_task function: The model or user can later call a check_task function with the task ID to inquire about the status and retrieve the result once the background operation is complete [00:55:35]. This decouples the execution of the task from the immediate conversational flow [00:58:05].

This pattern forms the basis for full-stack AI engineering in serverless environments and managing long-running workflows effectively [00:59:07].

Realtime API’s Native Asynchronous Handling

The OpenAI Realtime API takes asynchronous function handling a step further by supporting it natively [01:39:55].

  • Non-Blocking Calls: When the model calls a function within the Realtime API, it does not halt the conversation [01:40:02]. The user can continue talking, and the model will continue to respond, even if the function’s result is not yet available [01:40:05].
  • Contextual Understanding: This capability is crucial because, unlike traditional chat conversations, the flow of a real-time interaction cannot be arbitrarily paused. The Realtime API models were specifically trained to handle these asynchronous functions gracefully [01:40:11].

Additionally, the Realtime API can be influenced by specific “tricks”:

  • stay_silent Function: To control the model’s speaking turns, a stay_silent function can be provided. This allows the model to decide whether to continue speaking or wait, based on its interpretation of the user’s speech, rather than relying solely on Voice Activity Detection (VAD) [01:27:44].
  • XML Tag Guidance: Although not strictly function calling, the Realtime API can sometimes interpret and follow instructions embedded in XML tags within a script, allowing for specific speech patterns (e.g., pacing, tone) [01:29:12].

Dynamic Tool Creation

An experimental and inherently unsafe feature demonstrated is the ability for an agent to dynamically write and use its own functions at runtime [01:17:15]. By providing a function that takes a Python string representing a function’s implementation, and using exec (with extreme caution due to security risks), the agent can effectively “bootstrap” new capabilities [01:19:40]. This allows for self-extending AI systems.

Memory and State Management

Asynchronous functions are vital for managing memory and state in AI applications. A simple form of memory can be a list of past interactions, but for more sophisticated use cases:

  • Storing Memory: Functions can be used to add information to memory [02:56:00]. This memory can be persisted (e.g., in a local JSON file) and loaded at the beginning of a session [02:56:00].
  • Retrieval Augmented Generation (RAG): Instead of loading all memory, more advanced systems can perform “smart querying” using retrieval techniques (like semantic similarity or search) to load only the most relevant memories for the current context [03:09:00].
  • Consistency: To enforce consistency in stored memories, one approach is to perform a retrieval for similar memories before storing a new one and then explicitly check for contradictions or updates using the model. Timestamps and explicit “chains of updates” can help manage evolving information [03:32:00].

Multi-Agent Systems and Routing

For applications involving dozens or hundreds of functions, multi-agent patterns become crucial for efficient function calling and routing [01:07:53].

  • Responsibility Split: Functions can be grouped and assigned to different agents, each specializing in a set of related tasks (e.g., an “email agent” with email functions, a “calendar agent” with calendar functions) [01:11:16].
  • Handoffs: A primary “triage agent” can then use “transfer functions” to hand off the conversation and context to the appropriate specialized agent [01:13:01]. This can create a seamless experience where the initial agent routes the request, and the specialized agent immediately performs the action [01:15:07].
  • Dynamic Function Loading: Alternatively, functions can be dynamically loaded into memory or context based on the user’s input or the ongoing conversation, potentially using embeddings or a two-step function call process [01:08:51].

While there are no hard limits on the number of functions a model can handle in one iteration, a general rule of thumb for reliable performance without extensive prompting is around 10-20 functions [01:16:17]. Fine-tuning can extend this significantly, with successful cases reported for over 100 functions [01:16:36].