Importance of infrastructure design for AI applications

From: aidotengineer

Traditional infrastructure paradigms, largely shaped by Web 2.0 principles, are proving insufficient for the unique demands of modern AI applications, particularly those leveraging AI agents and large language models (LLMs) [00:01:56]. The shift in application characteristics necessitates a fundamental rethink of how computing resources are architected and managed.

The Evolution of AI Engineering Needs

The journey of an AI engineer often begins with simple prompts and tool calls, enough to prototype a concept [00:00:31]. However, as AI applications mature, engineers inevitably transition to building complex workflows [00:00:42]. This involves making non-deterministic AI code as deterministic as possible, often through chaining specific prompts, careful evaluation, and precise context control [00:00:44].

The most significant challenge quickly becomes “getting the right context into the prompts” [00:01:08]. This requires extensive data ingestion and LLM processing, forcing AI engineers to effectively become data engineers [00:01:05]. Workflows that initially run for seconds can extend to minutes or even hours [00:00:59].

Limitations of Traditional Infrastructure for AI

The core assumptions underpinning Web 2.0 infrastructure are challenged by AI applications:

Latency Expectations In Web 2.0, a typical web service responds in tens of milliseconds [00:01:36]. In contrast, AI applications often have a P1 (worst-case latency) of “a couple of seconds at best,” even with fast models or prompt caches [00:01:42]. A two-second response time would trigger an alert in a traditional setup [00:01:51].
Reliability and Outages Building reliable AI applications is difficult due to the “shoddy foundations” of current LLM providers [00:02:06]. Major providers like OpenAI and Gemini experience frequent outages, sometimes simultaneously, making failover strategies ineffective [00:02:48].
Rate Limits and Bursty Traffic AI applications, especially during batch processing or new customer onboarding, exhibit extremely bursty traffic patterns [00:02:56]. Contending with rate limits and obtaining higher service tiers often requires significant financial investment [00:03:05].
Serverless Provider Limitations Existing serverless providers are not well-suited for long-running workflows, often timing out after 5 minutes, limiting outgoing HTTP requests, and lacking native streaming support [00:04:04].

Addressing the Challenges: Key Design Principles

To overcome these infrastructure limitations, a tailored approach is essential. This often involves building custom platforms or leveraging specialized tools designed for the unique demands of AI.

Architectural Considerations for Long-Running AI Workflows

Key architectural points for AI network infrastructure and long-running AI workflows include:

Separation of API and Compute Layers Decoupling the API layer from the compute layer allows for independent scaling and flexibility, enabling users to “bring their own compute layer” [00:10:58].
Resumability with Redis Streams All API output, for both synchronous and asynchronous workflows, is directed to a Redis stream [00:11:22]. The API layer reads solely from this stream, not directly from the compute layer [00:11:29]. This design ensures:
- Resumability: Users can refresh pages or navigate away without losing progress [00:04:34]. The system provides the full history of status messages and output upon reconnection [00:11:39].
- Robustness: Intermediate errors are handled transparently, preventing users from having to restart lengthy processes from scratch [00:11:41].
Infrastructure-Aware Component Model A framework that is deeply aware of the underlying infrastructure can provide features like resumable streams for intermediate status and final output [00:07:32]. This “anti-framework” approach, inspired by React, focuses on reusable, idempotent, and independently testable “components” and “workflows” [00:07:50].
Built-in Reliability Features Components can be configured with built-in retries (e.g., exponential retry policies) and caching [00:09:44]. Each component also provides a retry and error boundary [00:08:57].
Enhanced Debugging Automated tracing across workflows and nested components provides detailed information on token usage, OpenAI calls, and messages, simplifying debugging [00:09:29].

Product Experience Examples

Thoughtful infrastructure design enables superior user experiences for AI applications:

Onboarding Experience

For applications requiring extensive data ingestion via LLMs upfront, a user can input a URL and immediately kick off a scraping job in the background [00:05:04]. The application can then provide real-time status updates on the ingestion process, allowing users to start using the product immediately rather than waiting minutes, which can increase fall-off [00:05:15].

Content Generation

In a content generation app, where an agent might take multiple minutes to write a blog post, users can be told upfront that the process will take time [00:05:53]. The UI can display a step-by-step progression (e.g., research, outline, writing sections) [00:05:58]. Crucially, if the user leaves and returns, they should see the same progress and context, preventing frustration from lost work [00:06:08].

Lessons Learned and Future Considerations

When designing infrastructure for agentic workflows:

Start Simple, Plan for Long-Running While initial workflows may be short, plan for a future where agents perform tasks for hours [00:12:13]. AI engineering is trending towards agents that operate independently and communicate on demand [00:12:31].
Separate Compute and API Planes This separation, coupled with leaning on Redis streams for resumability, is critical for scaling and user experience [00:12:39].
Enable Seamless User Experience Design for users to easily navigate away, not lose progress, and have errors handled transparently [00:12:44].
Deployment and Optimization Care For workflows running for extended periods, careful consideration must be given to deploying systems, including worker draining and blue/green deployment patterns [00:12:54]. The devil is in the details when aiming for reliable, long-running systems [00:13:06].

While building this infrastructure from scratch can be “fun,” open-source solutions like GenSx aim to provide these capabilities for developers [00:13:13].

Tubegraph

Explorer

Table of Contents