AI evaluation and risk scoring

From: aidotengineer

Don B, co-founder and CTO for Private S, recently open-sourced a solution for safety and security for AI agents. He is also the creator and PMC member of the open-source project Apache Ranger, which handles data governance for Big Data and is used by cloud providers like AWS, GCP, and Azure [00:00:16]. This article discusses how to build a safe and reliable AI agent, focusing on evaluation and risk scoring.

Understanding AI Agents [00:00:52]

From Don B’s perspective, AI agents are autonomous systems capable of their own reasoning and workflow generation [00:00:57]. They can call tasks to perform actions and use tools to make API calls [00:01:03].

Tasks are specific actions that may use Large Language Models (LLMs), Retrieval-Augmented Generation (RAGs), or tools [00:01:13].
Tools are functions that can fetch data from the internet, databases, or call service APIs [00:01:24].
Memories are contexts shared within the agents, tasks, and tools [00:01:36].

Most current agent frameworks run as a single process, meaning agents, tasks, and tools share the same process [00:02:03]. This can lead to security vulnerabilities, as credentials (often super admin privileges) and prompts within the process can be accessed by any third-party library [00:02:15]. This creates a “zero trust” issue [00:02:53]. Furthermore, agents’ autonomous and non-deterministic nature introduces “unknown unknowns,” significantly increasing attack vectors compared to traditional software [00:03:10].

Challenges in AI Agent Security and Safety [00:03:40]

Key challenges include:

Security: Improperly designed or implemented agents can lead to unauthorized access and data leakage [00:03:46].
Safety and Trust: Unreliable models or an unsafe environment (e.g., altered prompts) can result in incorrect outcomes [00:04:00].
Compliance and Governance: Many AI engineers are focused on getting agents to work, often overlooking critical aspects needed for enterprise readiness [00:04:16]. Enterprises, such as a major credit bureau, treat AI agents like human users who must adhere to extensive regulations (e.g., data privacy laws like GDPR or California resident data protections) [00:04:40]. Agents need onboarding processes and training to ensure they follow these regulations before production deployment [00:05:14].

Layered Solution for Safe and Reliable AI Agents [00:05:32]

Addressing these challenges requires a multi-layered approach [00:05:40], categorized into three areas:

Evaluations (Evals): Establishing criteria for production readiness [00:05:53].
Enforcement: Implementing strong controls [00:06:33].
Observability: Monitoring real-world usage and reacting to issues [00:06:54].

1. Evaluation (Evals) and Risk Scoring [00:07:14]

Just as traditional software development has gating factors for production deployment (e.g., test coverage, vulnerability scanning, CVE scanning, pen testing) [00:07:22], so too must AI agents [00:08:08].

Evaluation methods for AI agents should include:

Defining use cases and ground truth to ensure consistency when prompts, libraries, frameworks, or LLMs change [00:08:11].
Scanning third-party LLMs for vulnerabilities and ensuring they are not poisoned [00:08:31].
Scanning third-party libraries for Common Vulnerabilities and Exposures (CVEs) [00:08:38].
Prompt injection testing to ensure the application blocks malicious prompts [00:08:48].
Data leakage evaluation: Crucial for enterprise agents handling sensitive information (e.g., HR data) to prevent malicious users from exploiting loopholes and accessing unauthorized data [00:09:09].
Unauthorized actions evaluation: Verifying that agents performing actions (not just read-only) do so only when authorized [00:09:53].
Runaway agent evaluation: Testing for scenarios where agents enter infinite loops due to poor prompts or task definitions [00:10:11].

“The goal of this is to come with the risk score at the end of the day so that it gives a confidence that can you put this into production.” [00:10:32]

This evaluation helps generate a risk score, which dictates whether an agent (whether internal or third-party) is confident enough for production [00:10:34].

2. Enforcement [00:10:43]

Enforcement is vital as agents often operate in a zero-trust environment [00:10:51]. Key enforcement mechanisms include:

Authentication and Authorization: Ensuring the user’s identity is propagated through the entire agent-task-tool-API/database call chain [00:11:18]. This prevents impersonation and ensures access controls are properly applied based on the user’s and agent’s roles [00:11:46].
Approvals: Implementing workflows where agents can perform automated approvals up to a certain threshold [00:12:30]. Proper guard rails can be set to automatically involve a human when limits are exceeded [00:13:00].

3. Observability [00:13:56]

Observability is particularly crucial for AI agents due to their dynamic nature [00:14:00]. Unlike traditional software, agents involve rapidly changing models, evolving frameworks, and subjective user inputs [00:14:16].

Observability involves:

Monitoring user inputs and response behaviors [00:14:53].
Tracking the transmission of Personally Identifiable Information (PII) and confidential data [00:15:03].
Defining metrics and thresholds for failure rates [00:15:21]. Automated alerts should trigger if rates exceed tolerance levels, indicating issues like misbehaving agents or malicious users [00:15:40].
Anomaly detection: Similar to User Behavior Analytics (UBA) in traditional security, future systems will increasingly monitor agents’ behavior to ensure they operate within accepted boundaries [00:15:53].

These observability efforts contribute to a real-time security posture, providing confidence in an agent’s live performance [00:16:27].

Conclusion [00:16:40]

To ensure safe and reliable AI agents, a layered approach is essential:

Preemptive evaluation and vulnerability assessment to generate a risk score and determine production readiness [00:16:44].
Proactive enforcement with robust guard rails and sandboxing to run agents securely [00:16:59].
Observability for real-time monitoring of agent performance, allowing for quick fine-tuning and anomaly detection [00:17:12].

Private S has open-sourced their Safety and Security Solutions, called page.ai [00:17:27].

Tubegraph

Explorer

Table of Contents