Coding and debugging in AI workshops

From: aidotengineer

This article details the hands-on, dynamic approach to coding and debugging adopted in a specific AI workshop, emphasizing real-time implementation and problem-solving.

Workshop Structure and Philosophy

The workshop aimed to be highly dynamic and interactive, encouraging participants to interrupt with questions at any point via Slack or by unmuting themselves [00:00:40]. The core of the session involved significant live coding from scratch, with a recognition that debugging would likely be a part of the process [00:00:59].

A key philosophy highlighted was to approach concepts from “first principles” and reduce perceived complexity, suggesting that many advanced AI concepts are fundamentally about function calling [00:26:38].

Live Coding Environment

The presenter frequently used the Cursor IDE, noting its utility for auto-completion and code generation [00:28:46], [00:36:09]. The swarm framework was also utilized for its convenient looping tools and agent implementations, although the presenter noted that core agent logic can be implemented in a small amount of code (around 70 lines) [00:23:54], [01:06:40].

Core Coding Demonstrations

Crash Course on Function Calling

The workshop provided a rapid overview of function calling, noting its two main purposes: fetching data (e.g., reading APIs, retrieval) and taking action (e.g., writing APIs, managing application state, workflow actions) [01:20:00], [01:20:19]. A key distinction was made that the model tells you its intent to use a function but does not execute it; the developer is responsible for parsing, executing, and providing the result back to the model [01:14:09], [01:15:20].

Best practices for writing functions for AI models include:

Writing clear function descriptions and parameter purposes [01:17:50].
Applying software engineering best practices, ensuring functions are intuitive and follow the principle of least privilege [01:19:53].
Using enums and object structures to prevent models from making invalid calls [01:20:02].

Implementing Memory

A very basic form of memory was implemented using a simple Python list to store factual information about the user. This memory was persisted by writing it to and reading it from a local JSON file [00:26:27], [00:29:42]. The demonstration showed that an agent could retrieve this stored memory in subsequent interactions [00:31:39].

For enforcing consistency in stored memories, one suggestion was to perform retrieval of semantically similar memories before storing new ones, then use a model to explicitly check for updates or contradictions. Storing timestamps and creating explicit “chains of updates” (like a linked list of changes) was also proposed to manage evolving information [01:32:04].

Agent Delegation and Asynchronous Operations

The workshop explored different forms of agent delegation:

Handoffs: Fully swapping a conversation to a different agent by replacing its system prompt and tools [00:34:27]. This is seen as a primary use case for agents and handoffs, functioning as a “glorified triage” among multiple functions [01:15:22].
Nested calls: The simplest form of delegation, often overlooked [00:34:37].
Manager tasks: More complex, involving asynchronous operations [00:34:43].

A simple delegation to a “smarter model” was demonstrated by having one model call another directly via API [00:35:00]. The challenge of waiting for synchronous responses led to the implementation of asynchronous calls. Using asyncio.sleep to emulate non-blocking network calls, it was shown that multiple function calls could run in parallel, significantly reducing perceived latency [00:52:02], [00:53:35].

To address the issue of waiting for parallel tasks to complete before interacting with the model again, a “task” system was introduced. This allowed the main interaction loop to continue while tasks ran in the background. A create_task function would initiate a background process and return a task ID, and a check_tasks function would be called later to retrieve the results [00:54:27], [00:55:32]. This enabled the user to continue chatting with the model while background tasks were processed [01:00:40].

Self-Tool Writing Agents

Perhaps the most “fun” and “cool” part of the live coding was demonstrating an agent’s ability to write its own tools/functions dynamically [01:17:11], [01:25:55]. This was achieved by having the model generate Python code for a function, using exec to interpret and load it into the current environment, and then adding it to the agent’s available tools [01:21:31], [01:25:42]. While noted as “super dangerous code,” it successfully demonstrated an agent creating and utilizing its own custom calculator function [01:26:03].

Debugging in Practice

The workshop explicitly included debugging, which was an expected part of the live coding process [00:01:03]. Instances of debugging included:

Resolving audio issues with the Zoom setup [00:04:47].
Troubleshooting code where functions were not correctly provided to the model or were hallucinated [00:30:50], [00:31:00].
Identifying and fixing issues with printing function call results in the console [00:39:29].
Correcting the use of eval to exec for dynamic code execution [01:21:34].
Attempting to debug real-time API demos which proved challenging in a live setting [01:38:28], [01:39:08].

Advanced Considerations for Functions

Scaling Functions

When dealing with dozens or hundreds of functions, several techniques were suggested:

Multiple agents: Splitting responsibilities and function groupings among different agents, invoking the correct one as needed [01:07:53].
Fine-tuning: It’s possible to fine-tune smaller models with many functions (e.g., 120 functions with GPT-3.5 in a latency-sensitive project) [01:08:30].
Dynamic Function Loading: Based on user input or conversation context, only the most relevant functions are loaded into memory [01:08:51]. This can be done with embeddings or a two-step function call, where one function loads more functions (similar to a handoff) [01:09:06].
Router Patterns: Using multiple agents and handing off to one based on the query is an effective routing strategy [01:10:26].

A general rule of thumb suggested for models without extensive prompting or fine-tuning is to keep the number of tools/functions around 10 to 20 [01:16:20].

Function Calling in Vision Models

Regarding vision models, it was noted that while it is technically possible with post-training, the current API for GPT-4 does not expose or allow functions to be called within the model’s “thought” process or internal chain of thought. Function calls currently happen at the very end of the model’s generation [01:09:36].

Tubegraph

Explorer

Table of Contents