From: aidotengineer

Building AI applications, particularly those involving AI agents, presents a distinct set of challenges for infrastructure and development compared to traditional web applications [00:00:02].

Evolution of AI Development Workflows

Initially, AI engineering often begins with a simple prompt and a few tool calls, which might be sufficient for early-stage development [00:00:31]. However, this quickly evolves into building complex workflows to make non-deterministic code as deterministic as possible [00:00:42]. This involves chaining tailored prompts, careful context control, and extensive evaluation [00:00:49]. The runtime of these workflows can increase from seconds to several minutes [00:00:59].

Ultimately, AI engineers often transition into data engineers because the most significant challenge lies in providing the right context to prompts [00:01:05]. This can involve crawling user inboxes, ingesting data from sources like GitHub, and processing numerous artifacts with Large Language Models (LLMs) [00:01:13].

Shifting Infrastructure Assumptions

Traditional Web 2.0 infrastructure, designed for rapid API requests and database interactions returning responses in tens of milliseconds, is not well-suited for modern AI applications [00:01:26]. AI applications, especially those using LLMs, often have a response time of several seconds at best, even with fast models or prompt caches [00:01:39]. A multi-second response time that would trigger an alert in previous infrastructure paradigms is now common [00:01:49].

Reliability and Availability Concerns

Building reliable AI applications today is difficult due to what is described as “shoddy foundations” [00:02:06]. Developers frequently encounter issues and outages from core dependencies like OpenAI and Gemini, sometimes simultaneously [00:02:14].

Beyond outages, developers must contend with rate limits, especially given the bursty traffic patterns that arise from tasks like batch processing documents or onboarding new customers [00:02:53]. Achieving higher rate limits often requires significant financial investment [00:03:03].

Challenges with Long-Running Workflows

As AI workflows extend from minutes to hours, full-stack AI engineers increasingly face challenges typically associated with data engineering [00:03:22].

Existing tools for long-running workflows, such as SQS, Airflow, or Temporal, are often complex and not ideal for full-stack developers [00:03:38]. Current serverless providers are also generally ill-suited for long-running workflows because they:

  • Time out after about 5 minutes [00:04:08].
  • May limit outgoing HTTP requests [00:04:11].
  • Lack native streaming support, requiring it to be bolted on at the application layer [00:04:13].
  • Do not inherently support resumability, which is crucial if a user refreshes a page during a multi-minute process [00:04:31].

Common Product Experience Challenges

Building AI applications for product experiences like onboarding or content generation highlights key implementation challenges:

Onboarding

For user onboarding, where a significant amount of data might be ingested upfront using an LLM, the process can take multiple minutes and involve hundreds of LLM calls [00:05:06]. To prevent user fall-off, this ingestion must run in the background, and the application needs to transparently show the status to the user as they immediately begin using the product [00:05:25].

Content Generation

In applications like content generation where an agent performs tasks over several minutes, it’s essential to:

  • Inform the user about the expected duration [00:05:53].
  • Display the step-by-step progress (e.g., research, outlining, writing sections) [00:05:58].
  • Ensure that if a user leaves or navigates away from the page, they don’t lose context or progress, requiring resumability [00:06:08].
  • Stream both the final content and intermediate status to the user [00:06:52].
  • Minimize user frustration by handling intermediate errors transparently, avoiding the need to restart a 5-minute process from the beginning [00:07:04].

The friction involved in experimenting with longer workflows (e.g., beyond five minutes) often forces developers to make deep infrastructure changes due to serverless provider limitations, hindering rapid iteration [00:06:23].

Solutions and Architectural Considerations

To address these challenges, an infrastructure-aware component model can be adopted [00:07:32]. This approach focuses on building blocks and takes inspiration from React’s component model, applying it to the backend [00:07:51].

Key elements of such a solution include:

  • Components: Reusable, idempotent, and independently testable steps [00:08:07]. An example is a simple function that wraps an OpenAI SDK call, providing tooling for retries and tracing [00:08:18].
  • Workflows: Collections of components that run together [00:08:12]. Workflows offer retry boundaries, error boundaries, and traces at both the workflow and component levels for easy debugging [00:08:57].
  • Automatic REST APIs: Workflows can be automatically exposed as REST APIs supporting synchronous and asynchronous invocation, with APIs to retrieve intermediate and final output streams [00:09:11].
  • Built-in Retries and Caching: Components can be configured with fluent APIs for exponential retry policies and caching [00:09:44].

Architectural Blueprint

A tailored serverless platform for long-running workflows and agentic UIs can be built with a critical separation of API and compute layers [00:10:04]. This separation allows for:

  • Independent Scaling: The API layer can scale independently of the compute layer [00:11:00].
  • Pluggable Compute: The stateless compute layer can be pluggable, allowing users to bring their own compute [00:11:08].
  • Redis Streams for Resumability: All API output flows through Redis streams for both synchronous and asynchronous workflows [00:11:22]. The API layer reads solely from the Redis stream, not directly from the compute layer [00:11:29]. This design enables UIs to offer:
    • Page Refresh Resilience: Users can refresh the page or navigate away without losing progress [00:11:34].
    • Transparent Error Handling: Errors are handled without work loss [00:11:39].
    • Full History: Access to the complete history of status messages and output [00:11:41].
  • Heartbeats: The sandbox program emits heartbeats to monitor workflow execution and ensure completion, with automatic restart or user notification if a workflow crashes [00:10:41].

Lessons for Building AI Infrastructure

When building infrastructure for agentic workflows, it’s advised to:

  • Start Simple, Plan for Long-Running: Don’t over-engineer from day one, but anticipate future needs for workflows that run for extended periods, as this is the direction of AI agents [00:12:13].
  • Separate Compute and API: Maintain a distinct compute and API plane [00:12:39].
  • Leverage Redis Streams: Utilize Redis streams for resumability and to ensure progress is not lost when users navigate away or connections terminate [00:12:41].
  • Transparent Error Handling: Design systems to handle errors gracefully and transparently for the user [00:12:51].
  • Careful Deployment: Implement robust deployment strategies, such as blue/green deployments and careful worker draining, especially for workflows running for extended durations [00:12:54].

While it is easy to get started with AI applications, getting the underlying infrastructure right for long-running, reliable agentic workflows is considerably more difficult [00:13:06]. For those who prefer not to build all this infrastructure themselves, open-source libraries like Genisx implement many of these solutions [00:13:13].