From: aidotengineer
AI agents are gaining significant interest across product development, industry, and academia [00:00:31]. While rudimentary agents are already widely used and successful, such as tools like ChatGPT and Claude that function as basic agents with input/output filters and task execution capabilities [00:01:17], more ambitious visions for AI agents are far from being fully realized [00:01:59]. A key challenge preventing the widespread deployment of these agents is effectively understanding and managing their associated costs [00:02:28].
The Unaccounted Cost of AI Agents
Unlike traditional language models where evaluation costs are often bounded by context window length, AI agents can take open-ended actions in the real world, meaning there is no inherent ceiling to their potential cost [00:08:23]. This means that cost needs to be a primary consideration in all evaluations of AI agents [00:08:37]. Without including cost alongside accuracy or performance metrics, it becomes difficult to truly understand how well an agent operates [00:08:46].
The Holistic Agent Leaderboard (HAL) developed at Princeton aims to address these issues by evaluating agents on multiple dimensions, including cost and accuracy [00:09:52]. For instance, in a comparison between Cloud 3.5 and OpenAI’s O1 models for reproducibility tasks, Cloud 3.5 scored similarly in performance but cost significantly less (664) [00:10:10]. For AI engineers, a model that costs ten times less while performing comparably is the obvious choice [00:10:31].
The Jevons Paradox and AI Agent Pricing
While the cost of running LLMs has dropped drastically—by over two orders of magnitude from Text-Davinci-003 in 2022 to GPT-4o mini today [00:10:57]—scaling applications built with these models remains costly [00:11:19]. For AI engineers, the potential for prototypes to quickly accumulate thousands of dollars in costs acts as a barrier to iteration and release [00:11:27].
It is predicted that even as inference costs for LLM calls continue to decline, the overall cost of running agents will increase due to the Jevons Paradox [00:11:47]. This economic theory suggests that as the efficiency of resource use increases, the rate of consumption of that resource also increases [00:11:51]. Historically, this has been observed with coal mining (reduced cost led to increased usage) and ATM machines (easier installation led to more bank branches and tellers) [00:12:01]. Similarly, as the cost of language models drops, their usage will likely increase, driving up overall expenditures [00:12:26]. Therefore, accounting for cost in agent evaluations will remain crucial for the foreseeable future [00:12:30].
Conclusion: A Reliability Engineering Mindset
The primary challenge for AI engineers is to develop software optimizations and abstractions that effectively manage the inherent stochastic nature of components like LLMs [00:17:31]. This requires a shift in mindset, viewing AI engineering less as software or machine learning engineering and more as reliability engineering [00:17:54]. Just as early computing engineers focused on fixing reliability issues with systems like the ENIAC computer to make them usable [00:18:41], current AI engineers must prioritize addressing the reliability issues that affect every agent using stochastic models [00:19:09]. This ensures the next wave of computing is as reliable for end-users as possible [00:19:27].