From: aidotengineer

Building artificial intelligence (AI) applications, particularly those involving AI agents and long-running processes, presents significant infrastructure challenges. Traditional web infrastructure is often ill-suited for the unique demands of AI workloads, necessitating new strategies for resilience and efficiency [01:56:00].

The Evolution of AI Engineering Challenges

The journey of an AI engineer often starts with a simple prompt and a few tool calls, which might be sufficient for initial prototypes [00:31:00]. However, as applications mature, the focus shifts to building AI workflows [00:42:00]. This involves making non-deterministic code as deterministic as possible by chaining tailored prompts and carefully controlling context [00:44:00]. Workflows can quickly increase runtime from 30 seconds to several minutes [00:59:00].

Ultimately, many AI engineers find themselves becoming data engineers because the most challenging aspect is providing the correct context to prompts [01:05:00]. This often requires complex data ingestion from various sources, such as user inboxes or GitHub, which demands extensive LLM processing [01:11:00].

Infrastructure Limitations for AI Applications

Assumptions about infrastructure have significantly changed for AI applications compared to Web 2.0 [01:26:00]. Traditional web services expect API requests to return data in tens of milliseconds [01:28:00]. In contrast, AI applications often have P1 latencies of several seconds, even with fast models or prompt caches [01:39:00]. This means infrastructure designed for the past decade of the web is not suitable for modern AI applications [01:56:00].

Key issues impacting resilience include:

  • Reliability: LLM applications are built on “shoddy foundations,” making it difficult to build reliable apps [02:06:00]. Dependencies frequently experience outages, sometimes for extended periods [02:14:00].
  • Uptime: Even major AI providers like OpenAI and Gemini can experience concurrent outages [02:31:00].
  • Rate Limits: Burst traffic patterns, common when batch processing documents or onboarding new customers, quickly hit rate limits [02:53:00]. High-tier rate limits often require significant financial investment [03:03:00].

The Need for Longrunning Workflows in AI Deployment

As workflows extend to minutes or hours, full-stack AI engineers inadvertently become data engineers [03:22:00].

Existing Solutions and Their Limitations

Existing tools for longrunning workflows and data engineering include message queues like SQS, batch processing tools like Airflow, and durable execution engines like Temporal [03:34:00]. However, these are often designed with a different paradigm (e.g., Java engineering) and are not ideal for modern TypeScript/full-stack developers [03:50:00].

Serverless providers, while appealing, are not well-suited for these types of workflows [04:01:00]:

  • Timeouts: Most time out after 5 minutes [04:08:00].
  • HTTP Request Limits: Some limit outgoing HTTP requests [04:11:00].
  • Lack of Native Streaming: Streaming support must be bolted on at the application layer, not natively provided by the infrastructure [04:18:00].
  • No Resumability: Without native resumability, if a user refreshes the page or navigates away, the context of a multi-minute process is lost [04:29:00].

Common Product Experiences Requiring Resilience

AI applications often require workflows that run in the background while keeping users engaged:

  • Onboarding Data Ingestion: When a user inputs a URL for data ingestion, an initial LLM call might identify pages to scrape [04:51:00]. The subsequent scraping job can take minutes and involve hundreds of LLM calls [05:04:00]. To prevent user fall-off, ingestion must run in the background, with real-time status updates shown to the user [05:25:00].
  • Content Generation Agents: For tasks like writing a blog post, an agent might run for multiple minutes, performing research, outlining, and writing sections step-by-step [05:43:00]. It’s crucial that if the user leaves the page or navigates away, they don’t lose context, and can resume the process with the full history of status messages and output [06:08:00]. Experiencing intermediate errors after a long wait is far more frustrating without resumability [07:04:00].

Architectural Best Practices for Building AI Agents and Workflows

To address these challenges, specialized infrastructure and frameworks are needed.

Genisys Component Model and Workflows

Genisys developed an open-source library that provides an infrastructure-aware component model [07:29:00]. This framework is unopinionated and focuses on reusable building blocks, taking inspiration from React’s component model but applied to the backend [07:50:00]. The goal is to allow sharing, composition, and reuse of code without excessive abstraction [08:02:00].

  • Components: Reusable, idempotent, independently testable steps [08:07:00]. For example, a simple component might wrap the OpenAI SDK for LLM calls, providing tooling for retries and tracing while exposing the same API surface [08:18:00].
  • Workflows: Collections of components that run together [08:10:00]. Each step in a workflow gains a retry boundary and an error boundary [08:57:00]. Traces are provided at both the workflow and component levels, aiding debugging [09:01:00].
  • REST APIs: Workflows can be automatically turned into REST APIs supporting synchronous and asynchronous invocation, with APIs to retrieve intermediate streams and final output [09:11:00].
  • Built-in Retries and Caching: Components have a fluent API to configure policies like exponential retries and caching [09:44:00].

Tailored Serverless Platform Architecture

Genisys built a serverless platform specifically designed for longrunning workflows and agentic UIs [10:04:00].

Key architectural points include:

  • Separation of API and Compute Layers: The API layer and compute layer are completely separate [10:14:00]. The API layer invokes the compute layer and passes a Redis stream ID [10:20:00]. All subsequent communication (status, output) happens via the Redis stream [10:32:00].
    • Benefits: Allows independent scaling of API and compute layers [11:00:00], and enables users to “bring their own compute layer” as the compute is stateless [11:08:00].
  • Redis Streams for Resumability: The API layer only reads from the Redis stream, not directly from the compute layer [11:22:00].
    • Benefits: UIs can be built that allow users to refresh the page, navigate away, and still receive the full history of status messages and output, without losing progress [11:27:00]. Heartbeats from the executing sandbox program ensure background processes monitor workflow completion, allowing for automatic restarts or user notifications if a workflow crashes [10:41:00].

Lessons for Resilient AI Workflow Infrastructure

When building AI agents and their infrastructure:

  • Start Simple, Plan for Long-Running: Don’t over-engineer on day one for an hour-long workflow, but plan for a future where workflows run for extended periods, as this is the direction of AI agents [12:13:00].
  • Separate Compute and API: Keep your compute and API planes separate for independent scaling and flexibility [12:39:00].
  • Leverage Redis Streams for Resumability: Use Redis streams to enable users to navigate away from the page, not lose progress, and handle errors transparently [12:41:00].
  • Careful Deployment: When workflows run for extended periods, be meticulous about draining workers and implementing blue/green deployment patterns [12:54:00].

While it’s easy to get started with AI agents, achieving the necessary resilience for longrunning workflows is complex and requires attention to detail [13:06:00].