Guardrails 2.0 icon

Guardrails 2.0

Guardrails 2.0 is ElevenLabs’ control layer for ElevenAgents, designed to keep AI voice agents on-topic, policy-aligned, and safer to deploy in production. It is built for teams using voice agents in support, sales, marketing, reception, and internal workflows.

Guardrails 2.0

Overview

Guardrails 2.0 is ElevenLabs’ control layer for ElevenAgents, designed to keep voice agents aligned with a team’s instructions, safety rules, and operational goals. It adds layered checks around agent behavior so teams can reduce drift, catch manipulation attempts, and block policy-violating replies before they reach the user.

The product is aimed at production voice-agent deployments in support, sales, marketing, and internal workflows. Its controls can be configured in the agent settings or via API, and the page frames them as part of a broader trust-and-safety stack for enterprise deployments, including conversation analytics, optional zero-retention mode, and post-call redaction for eligible customers.

Core capabilities

System prompt hardening

A hardened system prompt provides the baseline policy, and the Focus Guardrail reinforces those instructions throughout a conversation to reduce drift in long or complex interactions.

User input validation

User inputs are checked for prompt injection and instruction-override attempts, with the option to terminate conversations that pose a security risk.

Agent response validation

Every response is evaluated against configured policies before it reaches the user, allowing unsafe or off-topic output to be blocked in real time.

Custom guardrails

Custom Guardrails let teams write domain-specific rules in natural language and enforce them automatically across calls with a block-or-allow decision.

Configurable enforcement behavior

Execution mode, exit strategy, content sensitivity, and per-guardrail toggles give teams control over how strict enforcement is and what happens after a trigger.

Logging and post-call redaction

Triggers and actions are logged in conversation analytics, and sensitive information can be redacted from transcripts, recordings, and webhook payloads after a call ends.

Where it fits

  • Keep agents on-topic

    Use Guardrails 2.0 when a voice agent needs to stay on script in long or complex calls, such as support or onboarding conversations where drift can lead to bad answers.

  • Reduce prompt-injection risk

    Apply manipulation and response checks in customer-facing workflows where users may try to override instructions or prompt the model into unsafe behavior.

  • Enforce domain policies

    Use custom guardrails to enforce company-specific policies, such as escalation rules, prohibited topics, or regulated language requirements.

  • Tune behavior for live calls

    Configure exit strategies and sensitivity levels for live voice interactions where the team wants different handling for low-risk and high-risk issues.

  • Support review and redaction workflows

    Combine logging and redaction with post-call QA workflows that need transcripts and recordings for review while removing sensitive details from stored artifacts.

Pros and Cons

Pros

  • Uses layered checks instead of relying only on the system prompt.
  • Covers three common risk points: drifting behavior, prompt injection, and unsafe responses.
  • Supports custom, domain-specific rules in natural language.
  • Lets teams choose enforcement behavior, exit strategies, and sensitivity levels.
  • Provides visibility through conversation analytics and trigger logs.

Cons

  • Some controls, including conversation history redaction and Zero Retention Mode, are described as enterprise-only.
  • The page does not list every supported integration or show detailed limits for each guardrail type.
  • Because guardrails can run in different enforcement modes, stricter settings may add latency compared with allowing responses to stream immediately.

FAQ

How do you enable Guardrails 2.0?

Guardrails 2.0 is configured in ElevenAgents. The page says you can turn them on in the Security tab of an agent’s settings or configure them via the API.

What does Guardrails 2.0 actually do?

The page describes three layers: system prompt hardening, user input validation, and agent response validation. These work together to reinforce instructions, detect manipulation attempts, and block policy-violating replies before delivery.

Can I define my own guardrail rules?

The page states that custom guardrails let you define domain-specific policies in natural language and enforce them automatically across every call. A lightweight model evaluates each response and returns a block or allow decision.

How does Guardrails handle a policy violation?

Yes. The page says execution modes let you choose between running guardrails alongside the response for near-zero delay or holding responses until they are fully cleared. It also notes you can define exit strategies such as ending the conversation, transferring to another agent, escalating to a human, or retrying with corrective instructions.

Are redaction and zero-retention features available to everyone?

Conversation history redaction and Zero Retention Mode are described as available to enterprise clients. The page directs customers to contact sales for access.

Quick Facts

Category
AI Voice Agents / Safety
Platform
ElevenAgents
Primary users
Teams deploying voice agents for support, sales, marketing, and internal workflows
Source domain
elevenlabs.io
Availability
Available in alpha in ElevenAgents; configurable in agent settings or via API
Pricing
No standalone price listed on the page; ElevenLabs offers paid plans and enterprise contact-sales options