AI safety and security

From: aidotengineer

Don B., co-founder and CTO for Private S, discusses the importance of building safe and reliable AI agents and the challenges involved in their deployment in enterprise environments [00:00:42]. His company recently open-sourced its solution for Safety and Security for Generative AI and AI agents [00:00:21].

Key Terminology [00:00:54]

AI agents: Autonomous systems capable of their own reasoning, workflow generation, and task execution [00:00:57]. They can call tasks to perform actions and use tools to make API calls [00:01:03].
Tasks: Specific actions that may utilize Large Language Models (LLMs), Retrieval Augmented Generation (RAGs), or tools [00:01:13].
Tools: Functions used to retrieve data from the internet, databases, or through service APIs [00:01:24].
Memories: Contexts shared between agents, tasks, and tools [00:01:36].

Challenges of AI Agents in Security [00:03:40]

Many current agent frameworks run as a single process, meaning agents, tasks, and tools share the same process [00:02:07]. This presents several security challenges:

Shared Credentials: Tools needing database or API access often use service user credentials with super admin privileges [00:02:24]. Within the same process, one tool could technically access credentials meant for another [00:02:33].
Third-Party Library Access: Any third-party library running within the process can access prompts and other sensitive information [00:02:47]. This creates a zero-trust issue [00:02:53].
Insecure LLMs: If agents or tasks interact with an insecure LLM, it can be exploited [00:02:58].
Autonomous and Non-Deterministic Behavior: Agents are autonomous by definition, creating their own workflows [00:03:10]. This leads to “unknown unknowns” in security, as it’s non-deterministic what an agent will do, resulting in a high number of attack vectors [00:03:21].

Consequences of Improper Agent Design [00:03:42]

Unauthorized Access and Data Leakage: Poorly designed agents can lead to unauthorized access and leakage of sensitive or confidential information [00:03:46].
Safety and Trust Issues: Unreliable models or unsafe environments (e.g., modified prompts) can lead to incorrect results [00:04:00].
Compliance and Governance: Many organizations struggle to make agents enterprise ready due to a lack of focus on regulatory compliance and governance during development [00:04:14]. For example, a credit bureau treats an AI agent similarly to a human user, requiring it to adhere to training, regulations (e.g., California resident data consent, international data access), and onboarding processes [00:04:40].

Addressing Safety and Security in AI Agents [00:05:32]

There is no “silver bullet” for security and compliance; a multi-layered approach is necessary [00:05:40]. This involves three key layers:

Evaluation
Enforcement
Observability

1. Evaluation (Evals) [00:05:50]

This layer defines the criteria for promoting an agent to production [00:05:53]. Unlike traditional model evaluations that focus on response quality, AI agent evals must also be security and safety focused [00:06:01]. The goal is to generate a risk score to determine if an agent can be promoted or if a third-party agent can be used [00:06:16].

Similar to traditional software development, AI agent evaluations should include:

Test Coverage: Ensuring adequate testing when writing code [00:07:37].
Vulnerability Scanning: For Docker containers [00:07:44].
Third-Party Software Scanning: Checking for Common Vulnerabilities and Exposures (CVEs) and remediating high/critical risks [00:07:48].
Penetration Testing: To prevent cross-site scripting and other vulnerabilities [00:08:01].

Specific evaluations for AI agents include:

Baseline Preservation: Ensuring changes (e.g., prompt modifications, new libraries, frameworks, or LLMs) don’t alter the established baseline behavior [00:08:11].
Third-Party LLM and Library Scans: Verifying that third-party LLMs are not poisoned and meet minimum vulnerability criteria [00:08:31].
Prompt Injection Testing: Ensuring the application has controls to block prompt injections [00:08:48].
Data Leakage Prevention: Testing to prevent sensitive information from being leaked, especially in enterprise contexts where agents handle confidential data (e.g., HR data) [00:09:09].
Unauthorized Actions: Verifying that agents performing actions (e.g., changing data) do so only with proper authorization [00:09:55].
Runaway Agent Detection: Testing for scenarios where agents enter infinite loops due to bad prompts or task/agent design issues [00:10:11].

2. Enforcement [00:10:43]

Enforcement is crucial for ensuring the proper implementation of security controls [00:10:37]. AI agents operate in a near zero-trust environment due to shared libraries and access to backend systems [00:10:51]. Key enforcement mechanisms include:

Authentication and Authorization: Essential to prevent impersonation and theft of confidential information [00:11:19]. User identity must be propagated from the initial request through tasks and tools to the final data access or API call point, where policies can be enforced [00:11:30]. Agents have their own roles, but user-specific roles must also be enforced if the agent acts on behalf of a user [00:11:54].
Approvals: Automated workflows can be designed where agents handle routine approvals up to certain thresholds [00:12:30]. Guardrails can be set to automatically involve a human when limits are exceeded [00:13:05].
Sandboxing: Providing a secure environment for running agents [00:17:06].

3. Observability [00:13:56]

Observability is critical for AI agents due to constantly changing variables like models, frameworks, and third-party libraries [00:14:16]. User inputs are highly subjective, requiring continuous monitoring of responses and potential Personally Identifiable Information (PII) or confidential data leakage [00:14:32].

Metrics and Thresholds: It’s impractical to monitor every request [00:15:15]. Instead, define thresholds and metrics (e.g., failure rates) that trigger alerts when exceeded, indicating misbehaving agents or malicious users [00:15:21].
Anomaly Detection: While still evolving for agents, this concept from traditional security (User Behavior Analytics) will become crucial for detecting if an agent is behaving within accepted boundaries [00:15:54].
Security Posture: All these elements contribute to a real-time security posture that reflects how well the agent is performing in a live environment [00:16:27].

Recap [00:16:42]

To ensure safety and security for AI agents:

Preemptive Eval: Conduct vulnerability assessments to gain confidence in promoting agents to production or using third-party agents [00:16:44].
Proactive Enforcement: Implement robust guardrails, sandboxes, and enforcement mechanisms to run agents securely [00:16:59].
Observability: Maintain real-time monitoring to understand agent performance and quickly fine-tune in case of anomalies [00:17:12].

Further Information [00:17:27]

Private S has open-sourced its Safety and Security solution called page.ai [00:17:29]. They are seeking design partners and contributors [00:17:38].

Tubegraph

Explorer

Table of Contents