From: aidotengineer

Developing applications with AI agents introduces significant infrastructure challenges, particularly at the compute layer, fundamentally changing assumptions from previous eras of web development [00:00:02].

The Evolution of AI Engineering Workflows

The journey of an AI engineer often begins with a simple prompt and a few tool calls, which might be sufficient for initial funding [00:00:31]. However, this quickly evolves into building complex workflows [00:00:42]. Initially, engineers aim to make non-deterministic AI code as deterministic as possible by chaining tailored prompts, using evals, and carefully controlling context, extending workflow runtimes from 30 seconds to several minutes [00:00:44].

Eventually, AI engineers become data engineers because the most challenging aspect is providing the correct context to prompts [00:01:05]. This involves tasks like crawling user inboxes, ingesting code from GitHub, or processing other artifacts that require extensive LLM processing [00:01:13].

Shifting Infrastructure Assumptions

Traditional Web 2.0 infrastructure was designed for web services that made API requests, interacted with databases, and returned responses in milliseconds [00:01:26]. In such an environment, a request taking a couple of seconds would trigger an alert [00:01:49].

However, in AI applications, a P1 (high-priority issue) might involve a response time of a few seconds, even with fast models or prompt caching [00:01:39]. The infrastructure developed over the last decade is not suitable for today’s AI applications [00:01:56].

Challenges with Current AI Infrastructure

Building reliable AI applications on current foundations is difficult [00:02:06].

Reliability and Availability

AI applications often suffer from reliability issues with their underlying dependencies [00:02:09]. Services like OpenAI or Gemini can experience outages simultaneously [00:02:31].

Rate Limits

Even without outages, developers must contend with rate limits, especially given the bursty traffic patterns that arise from batch processing documents or onboarding new customers [00:02:52]. Achieving higher rate limits often requires significant financial investment [00:03:03].

Limitations of Existing Tools for Long-Running Workflows

While tools for long-running workflows and data engineering exist, such as SQS queues, batch processing tools like Airflow, or durable execution engines like Temporal, they are often not ideal for full-stack AI engineers [00:03:31].

Current serverless providers are also poorly suited for long-running AI workflows because most time out after five minutes, some limit outgoing HTTP requests, and none have native streaming support [00:04:01]. Streaming and resumability are crucial for applications that might run for multiple minutes, allowing users to refresh the page without losing context [00:04:18].

Product Experience Challenges

Long-running AI agent workflows present specific challenges for user experience:

  • Onboarding and Data Ingestion: When ingesting data from a user’s URL using an LLM, the initial scrape might take minutes and involve hundreds of LLM calls to extract and enrich content [00:04:51]. Users should be able to use the product immediately, requiring this ingestion to run in the background with visible status updates to prevent fall-off [00:05:15].
  • Content Generation and Agentic Workflows: For processes like generating a blog post, which can take several minutes, the application needs to keep the user engaged by showing progress steps (e.g., research, outline, section writing) [00:05:43]. Crucially, if a user leaves and returns to the page, they should not lose context or progress [00:06:08].
    • Previously, workflows taking 3-4 minutes already created friction, deterring experimentation with longer workflows (5+ minutes) due to the need for deep infrastructure changes [00:06:20].
    • Streaming both final content and intermediate status is vital, along with the ability to resume where left off if a user refreshes the page [00:06:52].
    • Intermediate errors are much more frustrating if it takes minutes to return to the previous state [00:07:04].

Solutions for Robust AI Agent Infrastructure

A robust solution involves building an infrastructure-aware component model designed for agentic workflows [00:07:32].

Infra-Aware Component Model

This framework is deeply aware of the underlying infrastructure it runs on, and vice-versa, allowing for features like resumable streams for intermediate status and final output [00:07:38]. It’s unopinionated and focuses on reusable building blocks [00:07:50].

  • Components: These are reusable, idempotent, independently testable steps [00:08:07]. An example is a wrapped OpenAI SDK call that provides tooling for retries and tracing while exposing the full OpenAI surface area [00:08:18].
  • Workflows: Collections of components that run together, offering retry and error boundaries at each step, along with comprehensive traces for the workflow and its components [00:08:10]. Workflows can be automatically exposed as REST APIs supporting synchronous, asynchronous, and stream retrieval [00:09:11].
  • Debugging and Retries: Built-in tracing provides detailed information on nested components, tokens, and OpenAI call details [00:09:29]. Components also offer a fluent API for configuring retry policies (e.g., exponential retry) and caching [00:09:43].

Tailor-Made Serverless Platform

A serverless platform designed for long-running workflows and agentic UIs is crucial [00:10:04].

  • Separation of API and Compute Layers: This allows for independent scaling of the API and compute layers [00:10:14]. The compute layer can be pluggable (e.g., defaulting to ECS but allowing user-provided compute) as it is stateless [00:11:06].
  • Redis Streams for Communication: The API layer invokes the compute layer with a Redis stream ID, and subsequent communication for status and output happens via this stream [00:10:20].
  • Resumability and Monitoring: All API interactions go through the Redis stream for both synchronous and asynchronous workflows, enabling resumability [00:11:23]. The API layer reads solely from the Redis stream, not directly from the compute layer, allowing UIs to refresh, navigate away, and transparently handle errors while retaining full status history and output [00:11:29]. The sandbox program emits heartbeats for background processes to monitor workflow completion, enabling automatic restarts or user notifications in case of failures [00:10:41].

Lessons for Building Agentic Workflow Infrastructure

  • Start Simple, Plan for Long-Running: Begin with basic workflows, but anticipate and design for a future where agents will run for extended periods (e.g., an hour or more), taking instructions and performing work autonomously [00:12:13].
  • Separate Compute and API: Keep your compute and API planes distinct to allow independent scaling and flexibility [00:12:39].
  • Leverage Redis Streams for Resumability: Rely on Redis streams to manage state and output, making it easy for users to navigate away from a page without losing progress and enabling transparent error handling [00:12:41].
  • Careful Deployment: When workflows run for extended durations (e.g., 60 minutes), implement robust deployment strategies like careful worker draining and blue/green deployments [00:12:54].

While it’s easy to start building AI agent applications, getting the infrastructure right is challenging and requires attention to detail [00:13:06].

Disclaimer: AI-Generated Diagrams

In the spirit of embracing AI for this talk, all diagrams presented were AI-generated. Be aware that the AI may still have spelling issues. [00:00:21]