Impact of AI and agents on infrastructure

From: aidotengineer

Evan Bole, founder and CEO of Genisx, presented on how agents have significantly impacted infrastructure, with a particular focus on the compute layer [00:00:02].

The Evolution of AI Engineering and Infrastructure Demands

The journey for AI engineers often begins with simple prompts and tool calls, which can be sufficient for initial funding rounds [00:00:31]. However, as projects mature, the focus shifts to building complex workflows [00:00:42]. This involves making non-deterministic code as deterministic as possible by chaining tailored prompts and carefully controlling context, extending workflow runtimes from seconds to minutes [00:00:44].

Eventually, AI engineers often transition into data engineers because the most significant challenge becomes providing the correct context to prompts [00:01:05]. This necessitates crawling user inboxes, ingesting data from platforms like GitHub, and processing numerous artifacts, all requiring extensive LLM processing [00:01:13].

Shifting Infrastructure Assumptions

Traditional Web 2.0 infrastructure, designed for simple web services with API requests, database interactions, and responses within tens of milliseconds, is ill-suited for modern AI applications [00:01:25]. In AI applications, a P1 (primary latency metric) of a couple of seconds is considered good, and only if a fast model or prompt cache is used [00:01:39].

Reliability and Availability Challenges

Current LLM applications are built on “shoddy foundations,” making reliable app development difficult [00:02:06]. Outages from dependencies like OpenAI and Gemini can occur simultaneously, impacting service availability [00:02:17]. Even without full outages, rate limits pose a significant challenge, especially for bursty traffic patterns seen during batch processing or new customer onboarding [00:02:53]. Achieving higher rate limits often requires substantial financial investment [00:03:03].

The shift from simple prototypes to complex, long-running workflows turns full-stack AI engineers into data engineers [00:03:16].

Existing Solutions and Their Limitations

Long-running workflows, a niche historically, have existing data engineering tools [00:03:34]. These include:

Queues like SQS [00:03:42]
Batch processing tools like Airflow [00:03:44]
Durable execution engines like Temporal [00:03:46]

However, these tools, often designed by Java engineers, are not preferred by full-stack TypeScript developers [00:03:50].

Existing serverless providers are also not well-suited for long-running workflows due to:

Timeouts, typically after 5 minutes [00:04:06]
Limitations on outgoing HTTP requests [00:04:11]
Lack of native streaming support, requiring it to be bolted on at the application layer [00:04:16]
No inherent resumability, which is crucial for applications that run for multiple minutes and where users might refresh the page [00:04:28]

Product Experience Requirements for AI Agents

For agentic workflows, certain product experiences necessitate specific infrastructure capabilities:

Onboarding: Users submit a URL, an LLM extracts information, and a scraping job is kicked off in the background, making hundreds of LLM calls to enrich content [00:04:51]. Users need to start using the product immediately while ingestion occurs, and the system must show real-time status updates [00:05:15]. Waiting 10 minutes for ingestion increases user fall-off [00:05:25].
Content Generation: For tasks like writing a blog post with an agent, the process can take several minutes, involving research, outlining, and section writing [00:05:43]. It’s critical that if a user leaves or navigates away, they do not lose context or progress [00:06:08]. Running workflows longer than five minutes often requires deep infrastructure changes due to serverless provider limitations [00:06:22].
Streaming and Resumability: Both final content and intermediate status need to be streamed [00:06:52]. If a user refreshes the page, they should resume from where they left off, seeing the same status and output [00:06:57]. Intermediate errors are less frustrating if users don’t lose significant progress [00:07:04].

Genisx’s Approach to Agentic Infrastructure

Genisx developed an open-source library to address these challenges, creating a simple, infrastructure-aware component model [00:07:29]. This framework provides resumable streams for intermediate status and final output [00:07:45].

Key principles include:

Anti-framework: Unopinionated and focused on building blocks, taking inspiration from React’s component model applied to the backend [00:07:50].
Composition over Abstraction: Emphasizes sharing, composing, and reusing code rather than abstracting it [00:08:02].

The solution consists of:

Components: Reusable, idempotent, independently testable steps [00:08:07]. An example is a wrapped OpenAI SDK call that includes tooling for retries and tracing, while maintaining the same OpenAI surface area [00:08:17].
Workflows: Collections of components that run together [00:08:10]. Each step in a workflow gets a retry boundary and an error boundary, along with traces at both the workflow and component levels for easy debugging [00:09:01].
Automatic REST APIs: Workflows can be turned into REST APIs that support synchronous and asynchronous invocation, with APIs to retrieve intermediate and final output streams [00:09:11].
Built-in Features: Components can be configured with a fluent API for features like exponential retry policies and caching [00:09:44].

Architectural Highlights

The platform is a serverless solution tailored for long-running workflows and agentic UIs [00:10:04]. Key architectural points include:

Separation of API and Compute Layers: These layers are completely separate, allowing independent scaling [00:10:14]. The API layer invokes the compute layer once, passing a Redis stream ID, and subsequent communication happens via the Redis stream [00:10:20].
Pluggable Compute Layer: The compute layer is stateless and interacts only with the API layer and Redis streams, allowing users to bring their own compute [00:11:06].
Redis Streams for Resumability: All API output goes to the Redis stream for both synchronous and asynchronous workflows [00:11:22]. This means the API layer reads from the Redis stream, not directly from the compute layer, enabling UIs that support page refreshes, navigation away, transparent error handling, and a full history of status messages and output [00:11:29]. None of the work is lost if the user navigates away or the browser connection terminates [00:11:59].
Heartbeats: The sandbox program emits heartbeats, allowing background processes to monitor workflow completion and automatically restart or notify users if a workflow crashes [00:10:41].

Lessons and Considerations for Building Agentic Infrastructure

When building infrastructure for agentic workflows, it’s essential to:

Start Simple: Don’t build for an hour-long workflow on day one, but plan for a future with long-running processes [00:12:13]. The future of agents involves giving instructions and letting them work independently [00:12:21].
Separate Compute and API Planes: This allows for independent scaling and flexibility [00:12:39].
Leverage Redis Streams for Resumability: Make it easy for users to navigate away, not lose progress, and handle errors transparently [00:12:41].
Careful Deployment: For workflows running for extended periods (e.g., 60 minutes), be cautious with draining workers and implement blue/green deployment patterns [00:12:54].
Attention to Detail: While it’s easy to get started, getting long-running agentic workflows right is challenging due to the intricate details involved [00:13:04].

For those not wanting to build this infrastructure themselves, Genisx provides an open-source library that implements many of these solutions [00:13:13].

Tubegraph

Explorer

Table of Contents