Prompt Tax in AI Development

From: aidotengineer

Developing agentic systems today introduces the concept of “prompt tax” [00:00:05]. This concept describes the challenges and hidden costs associated with shipping products at the frontier of AI, where advancements are rapid, and the underlying models are constantly evolving [00:00:11].

The Pain of Progress

The pace of AI model development is breathtaking, with new releases from labs like Anthropic, Google Gemini, and OpenAI occurring frequently [00:00:41]. While these advances offer incredible opportunities to bolt new functionality into applications [00:01:21], they also come with unintended consequences, as these probabilistic systems can behave unexpectedly [00:01:28]. This creates a constant tension between the opportunities new AI models present and the risks of introducing regressions or unforeseen issues into products [00:01:38].

What is Prompt Tax?

Prompt tax is a hidden cost incurred when integrating new AI model functionalities into existing applications [00:01:01]. Unlike technical debt, which is often a trade-off for shipping quickly, prompt tax arises from the desire to upgrade to new models that unlock new capabilities [00:11:00]. The core challenge is the inherent uncertainty: you don’t know exactly what will improve or what will break when a new model is implemented [00:11:10].

As the number of domain-specific prompts in an agentic system grows, the “prompt tax” increases [00:09:30]. When a new AI model is released, teams must:

Experiment rigorously: Determine if the new model unlocks envisioned features or inspires new ideas [00:09:41].
Assess prompt migration effort: Understand the “prompt tax” required to adapt existing prompts to the new model [00:10:04].
Address inherent fear: Mitigate the anxiety caused by unknown unknowns when shipping a new AI model [00:10:10].

Case Study: Orbital’s Agentic System

Orbital, a company focused on automating real estate due diligence, provides a practical example of navigating the prompt tax [00:01:56]. Their agentic software, Orbital Co-Pilot, streamlines the process of reading legal documents and compiling information that lawyers previously performed manually [00:02:29].

Orbital’s Journey and AI Model Evolution: Orbital has evolved its AI models significantly, starting with GPT-3.5 and moving through various versions of GPT-4, including 32K and Turbo 40, and also adopting “System 2” models like 01 preview and 04 mini [00:06:27].

Strategic Decisions:

Optimizing for Prompting over Fine-tuning: Orbital prioritized prompting to maximize development speed. This allowed them to quickly incorporate user feedback by adjusting prompts in real-time, especially crucial for finding product-market fit [00:07:03].
Heavy Reliance on Domain Experts: Real estate lawyers with decades of experience write many of the domain-specific prompts, effectively teaching the AI system their expertise [00:07:34].
“Vibes over Evals”: While rigorous evaluation systems are ideal, Orbital has largely relied on subjective human feedback from domain experts to test systems prior to release, demonstrating significant growth in tokens, revenue, and user satisfaction [00:08:04]. This approach, however, faces scalability challenges as the product’s surface area grows [00:21:06].

Prompt Categories: Orbital’s prompts fall into two main areas:

Agentic Prompts: Owned by AI engineers, these are system prompts that help the model choose and use tools [00:08:56].
Domain-Specific Prompts: Used by real estate lawyers to teach the system expertise in the real estate domain [00:09:09]. The number of these prompts has grown from near zero to over 1,000 [00:09:21].

Battle-Tested Tactics for Managing Prompt Tax

Migrating Between System Models

When migrating from “System 1” models (e.g., GPT-40) to “System 2” models (e.g., 01 preview):

Specify “What to Do” not “How to Accomplish”: System 2 models do not require specific instructions on how to perform a task; just define the objective [00:12:12].
Leaner Prompts: Remove repetitive instructions previously needed for System 1 models [00:12:26].
Unblock the Model: Avoid too many constraints with System 2 models. Give them a clear objective and allow them time to reason and plan [00:12:40].

Leveraging Model Strengths

Thought Tokens for Debugging and Explainability: While System 2 models are often preferred, System 1 models can be cheaper and faster [00:13:07]. Their “thought tokens” can provide valuable insights for debugging or explaining complex legal matters to users [00:13:13].
Using System 2 Models for Prompt Migration: Newer models can help migrate older domain-specific prompts to their own format, significantly reducing manual effort [00:15:45].

Deployment and Risk Mitigation

Feature Flags for AI Model Upgrades: Similar to software development, progressively rolling out new AI model upgrades with feature flags can mitigate risk [00:13:46].
Overcoming Change Aversion Bias: Users often feel more anxiety towards a new system, even if it’s superior, simply due to familiarity with the old one [00:14:00]. Articulating a change can heighten awareness of issues, sometimes outweighing the positives [00:14:30].
“Betting on the Model”: Teams should anticipate future AI model capabilities (smarter, cheaper, faster) and design features that will improve as models evolve. This prevents stagnation and allows products to grow with the models [00:14:56].

Managing Uncertainty and Feedback

Brave Shipping: Given the probabilistic nature of AI models and the uncertainty of new capabilities, a team must be willing to “ship and then deal with the consequences” [00:16:12]. Mitigate risks, but overcome the inherent anxiety to get new models to users [00:16:56].
Strong, Fast Feedback Loops: Implement systems that allow users to provide feedback directly (e.g., thumbs up/down) [00:17:10]. This feedback should be routed to AI engineers and domain experts quickly, enabling prompt changes and deployment to production within minutes or hours, rather than days or weeks [00:17:22].
Progressive Delivery (Upgrade Now, Fix on the Fly): Roll out new models incrementally—internally, then to a limited user group, and gradually scale up [00:22:30]. Adjust calibration based on feedback, scaling up until the feedback becomes minimal [00:22:50].

The Future of AI Engineering

Deis Havaris, CEO of Google DeepMind, emphasizes the unique challenge of the AI space: the underlying tech stack is evolving incredibly fast, unlike past revolutionary technologies that stabilized sooner [00:18:17]. This means product designers and managers need a deep technical understanding to anticipate where technology will be in a year [00:19:09].

This presents a significant opportunity for “product AI engineers” – individuals or teams who understand customer problems and can connect the technical capabilities of models with real user needs [00:19:40]. Their ability to turn new model features into product solutions is incredibly promising for the future of the AI engineering community [00:20:00].

Scaling Confidence

While “vibes” (subjective evaluation) have worked for Orbital thus far, the question remains whether this approach will scale as the product’s surface area increases [00:21:06]. Evaluation (eval) systems are often seen as the answer, potentially alleviating challenges [00:21:32]. However, for complex domains like real estate legal, evaluating correctness, style, conciseness, and citation accuracy across numerous prompts and edge cases can be prohibitively expensive, slow, and potentially impossible to keep pace with product velocity [00:21:49].

To stay on the edge of the AI frontier and maximize opportunities, the emphasis is on shipping now [00:23:39]. The anxiety about potential downsides may not materialize, or the progressive delivery approach of incremental rollouts and quick fixes can mitigate risks, ensuring that the benefits of new AI models are realized [00:24:10].

Tubegraph

Explorer

Table of Contents