From: aidotengineer
AI applications, particularly those involving agents and complex workflows, introduce new demands on infrastructure that traditional web 2.0 paradigms and existing serverless providers struggle to meet [02:01:06]. The shift from short, milliseconds-long requests to processes that can run for minutes or even hours highlights significant limitations in current serverless offerings [02:22:00].
The Evolution of AI Workflows and Infrastructure Demands
Traditionally, web 2.0 services involved simple API requests, database interactions, and quick returns within tens of milliseconds [01:27:54]. However, AI applications operate differently:
- Extended Runtimes: AI applications often have P1 (critical path) runtimes of several seconds at best, and this is only with fast models or prompt caches [01:39:10]. Workflows can extend from 30 seconds to several minutes [00:59:05].
- Data Engineering Shift: As AI engineers build workflows from non-deterministic code, they increasingly become data engineers, focusing on getting the right context into prompts [01:05:00]. This involves extensive LM processing to ingest data from sources like user inboxes or GitHub [01:13:00].
- Reliability Challenges: LLM applications are built on “shoddy foundations,” making it difficult to build reliable apps due to frequent outages and rate limits from dependencies like OpenAI [02:06:00]. Traffic patterns can be extremely bursty, exacerbating rate limit issues [02:56:00].
Key Limitations of Existing Serverless Platforms
Existing serverless providers, while easy for quick prototypes (like a Next.js chatbot template), are not well-suited for long-running workflows [04:04:00]:
- Timeout Limits: Most serverless functions time out after approximately 5 minutes [04:08:00].
- HTTP Request Restrictions: Some providers limit outgoing HTTP requests [04:11:00].
- Lack of Native Streaming Support: Streaming is typically bolted on at the application layer rather than being a native part of the infrastructure [04:18:00]. This is crucial for applications that run for multiple minutes [04:29:00].
- Absence of Resumability: Current serverless platforms generally do not offer native resumability [04:34:00]. This means if a user refreshes the page or navigates away, the workflow context is lost [06:08:00].
Impact on User Experience and Development
These limitations create significant friction for developers and a poor experience for users:
- Development Friction: Full stack AI engineers are deterred from experimenting with workflows longer than five minutes because it necessitates deep infrastructure changes due to serverless provider limitations [06:26:00].
- User Engagement: For long-running agentic processes, like content generation, users need constant engagement through intermediate status updates and the ability to resume without losing context [06:52:00].
- Onboarding Challenges: For processes like data ingestion, waiting 10 minutes for completion increases user fall-off in the funnel [05:25:00]. Background processing with real-time status updates is necessary [05:29:00].
- Error Handling: If an intermediate error occurs, users become frustrated if they have to wait 5 minutes to get back to the same point [07:04:00].
Addressing the Challenges
While traditional long-running workflows and data engineering tools like SQS, Airflow, or Temporal exist, they are often designed for Java engineers and are not ideal for TypeScript or full stack engineers [03:31:00]. A robust solution for agentic workflows requires addressing these limitations by planning for a future that is inherently long-running [12:19:00]. This includes keeping compute and API layers separate and leveraging technologies like Redis streams for resumability [12:39:00].
It's easy to get started with basic prototypes, but building reliable and resilient [[longrunning_agents_and_failure_resilience | long-running AI workflows]] correctly in a serverless environment, especially when considering continuous deployment and worker draining, is very challenging <a class="yt-timestamp" data-t="13:04:00">[13:04:00]</a>.