History of function calling in language models

From: aidotengineer

Function calling in large language models has evolved significantly, moving from simple text completion to enabling complex interactions with external systems and advanced agentic behaviors [01:19:00]. This capability allows models to generate structured data that represents actions, which can then be parsed and executed by external code [07:06:00].

Early Language Models: Text Completion [01:52:00]

The earliest generations of language models, such as the original GPT, GPT-2, and GPT-3, were primarily “base models” designed for text completion [01:54:00]. Users would provide an input text, and the model would simply continue the sentence [02:01:00]. While impressive for generating human-like language, getting these models to follow specific instructions was challenging [02:12:00]. For example, setting up a chatbot was non-trivial, often requiring “few-shot” prompting where users provided examples of questions and answers to guide the model [02:21:00].

Evolution to Instruction Following and Roles

InstructGPT: OpenAI introduced “instruction following” with InstructGPT, allowing models to directly perform what users asked, rather than just completing text [02:51:00].
User and Assistant Roles: Later, the concept of distinct “user” and “assistant” roles was introduced, primarily through post-training, giving models specific personas in a conversation [03:02:00].

Early Integration of External Tools

The ability to interact with external states via additional tools was a significant leap [03:14:00].

WebGPT (2021): One of the first instances of function calling in OpenAI was through WebGPT, a version of GPT-3 trained to use a specific set of functions for web search [03:38:00]. This marked a shift from merely generating text to generating actions that were then parsed and re-introduced into the model’s context [07:06:00]. WebGPT was trained by allowing users to complete tasks using commands in an interface, teaching the model to imitate user behavior and produce preferred responses [07:21:00]. However, this approach was very specific to the tools it was trained on [07:54:00].
Meta’s Toolformer (Undated, post-WebGPT): Meta (then Facebook) introduced a method to teach models how to use any tools [08:00:00]. Their “Toolformer” paper used a clever technique involving log-probabilities to determine where a function call would best reduce the perplexity of a sentence, thereby improving the model’s ability to integrate tool usage effectively [08:21:00]. This approach required minimal human-labeled examples [09:00:00].

General Function Calling by OpenAI (June 2023)

In June 2023, OpenAI launched general function calling [09:23:00]. The models were post-trained to use tools, meaning developers no longer needed to provide many examples; the models could call functions based on a defined syntax [09:28:00]. This generalized capability made function calling a powerful foundation for various AI applications [09:50:00].

Modern Concepts and Practical Applications

Today, function calling serves two main purposes: fetching data (e.g., reading APIs, retrieval, memory) and taking action (e.g., writing via APIs, managing application state, workflow actions) [10:22:00].

Function vs. Tool

Within OpenAI’s API, “functions” are the raw function calling interfaces provided by the user, where the user is responsible for executing the code [14:06:00]. “Tools” are a superset of functions, encompassing functions as well as hosted solutions like code interpreters or file search [14:16:00].

Agents and Loops

Modern AI agents are often conceptualized as “loops” where the model iteratively calls functions, processes results, and continues the conversation or task [20:20:00]. This includes complex behaviors like:

Memory: Implementing basic memory can be as simple as functions to add and retrieve text from a list, which can then be persisted (e.g., to a JSON file) [26:27:00]. More advanced memory systems might involve semantic similarity and retrieval [32:07:00].
Delegation and Asynchrony: Agents can delegate tasks to other models or processes, enabling asynchronous operations [34:07:00]. This allows the primary interaction to continue while a delegated task runs in the background, with the results being integrated once available [39:53:00].
Routing and Handoffs: For applications with dozens or hundreds of functions, multi-agent patterns become useful [10:07:00]. A “triage” agent can route tasks to specialized agents, each with a focused set of functions [11:10:00]. This is often achieved through “handoffs,” where the conversation context and tools are swapped to a different agent [34:27:00].
Self-Modifying Agents: Advanced applications can involve agents capable of generating and adding new functions to their own toolset at runtime [17:15:00]. While powerful, this capability carries significant security risks due to the use of functions like exec() [26:36:00].

The evolution of function calling underscores a fundamental shift in how language models interact with the world, transforming them from mere text generators into versatile tools capable of dynamic action and complex workflows [09:50:00].

Tubegraph

Explorer

Table of Contents