AI Agents and Automation

From: allin

AI agents are emerging as a significant development in artificial intelligence, with 2025 anticipated to be a pivotal year for their widespread adoption [01:10:08]. These agents, often described as advanced “cron jobs” [01:04:21], are designed to operate autonomously in the background, performing tasks that typically require human intervention [01:04:24].

Functionality and User Interface

Companies like OpenAI are reportedly planning to offer AI agent services, with potential costs ranging from $2, 000 t o$ 20,000 per month for various tiers [01:04:14]. Manis, a Chinese company, has showcased a compelling visualization of these agents, featuring a two-pane UI: a standard chatbot interface alongside a view displaying the agent’s real-time actions [01:05:21]. This agent can seamlessly toggle between applications like search, browser, code terminal, and document editor, creating to-do lists and marking tasks complete [01:05:32]. The basic idea is that AI can effectively perform any action on a computer or software that a human can [01:08:00].

A new standard called MCP is rapidly gaining traction to facilitate the connection of agents with applications, enabling them to understand data and possible actions within various SaaS (Software as a Service) applications [01:06:30].

Impact on Business Models

The rise of AI agents is poised to fundamentally alter the software business model [01:09:19]. Traditionally, software is priced per user [01:09:52]. However, AI agents allow software to directly pursue “labor spend” by delivering underlying workflows or outcomes to the customer [01:10:11]. For example, if AI agents can perform work equivalent to paralegals, a software vendor could sell a multiple of the previous per-seat license, targeting the cost savings from human labor [01:10:25].

Job Displacement vs. New Opportunities

There is a debate regarding whether AI agents will primarily replace existing jobs or create new opportunities. While some tasks will be automated, a significant portion of AI usage is expected to be for tasks that were previously unaffordable or never undertaken [01:10:50].

Logistics: AI is used to make thousands of phone calls daily to truck drivers, identifying matches for loads and activating them on platforms [01:11:14]. This was previously too expensive to do with human call centers [01:11:32].
Knowledge Work: The vast majority of AI use cases in large enterprises are for tasks like reviewing contracts, automating invoice processes, or creating marketing campaigns in multiple languages – work that was not previously feasible due to cost or complexity [01:12:08]. It’s estimated that 90% of future AI usage will be in areas not currently performed by humans [01:12:37].
Legal: AI is being used to review legal documents, identify changes, and compare versions, potentially reducing the need for attorneys in initial review stages [01:12:51].

Challenges and Limitations

Despite the rapid progress, AI agents face significant challenges, especially in regulated industries where mistakes can lead to severe consequences [01:14:37].

Hallucinations and Accuracy: Probabilistic software (like LLMs that can hallucinate) can produce errors, unlike deterministic code [01:14:56]. This makes quality assurance (QA) paramount [01:15:17]. For instance, in financial services, an AI mistake could lead to fines for non-compliance [01:15:24]. The best models currently achieve around 90% accuracy in specific enterprise data tests, which is not sufficient for high-stakes applications [01:23:13].
Trough of Disillusionment: Many companies have invested heavily in AI products, but now face the “trough of disillusionment” as they encounter technical complexities and difficulties moving from experimentation to mainline production [01:13:45].
Cost: Layering multiple AI models or running multiple passes to improve accuracy significantly increases “test time compute” costs, which can be astronomical [01:29:41].

Exponential Progress

The impact of AI is not yet peaking, with exponential progress occurring along three key dimensions [01:18:28]:

Algorithms: AI models are improving by three to four times annually, not just in speed but qualitatively, moving from simple chatbots to reasoning models that can break down complex questions [01:18:40].
Chips: AI chips, such as Nvidia’s H100 to GB200, are becoming three to four times better with each new generation, rolled out roughly annually [01:20:10].
Data Centers: The deployment of GPUs in AI data centers is scaling rapidly, from hundreds of thousands to potentially millions of GPUs, requiring a shift from megawatt to gigawatt power consumption [01:21:09].

This combined exponential progress (10x every two years) suggests a million-fold increase in AI compute availability over a presidential term, leading to massive impacts on the economy [01:22:04].

Specific Applications and Future Outlook

Coding Assistance: Progress in coding assistance is rapid because code compiles and can be objectively validated, allowing AI to learn and iterate effectively through reinforcement learning [01:26:59].
Math: Similarly, math is another area where AI is improving rapidly due to the ability to validate proofs [01:27:29].
Regulatory Compliance: AI could classify products and determine duties owed, potentially becoming the “source of truth” if governments adopt these systems [01:30:47].

While AI may excel in areas with clear validation, its progress in less objective fields like legal work, where validation is harder, remains a key question [01:27:41]. The role of “improvement engineering”—a specialized skill focused on shrinking error rates to zero—will be crucial for building reliable software for enterprise production [01:32:19]. This requires deep expertise in quality systems to ensure practical and trustworthy AI deployment [01:32:48].

Tubegraph

Explorer

Table of Contents