Guardrails - Agency Swarm

Guardrails validate agent inputs and outputs using the Agents SDK. They act as checkpoints that screen incoming messages before processing and review agent responses before delivery, ensuring agents stay on-topic, avoid sensitive content, and follow format requirements.

When to Use Guardrails

Guardrails solve validation challenges that go beyond simple field checks:

Content Filtering

Block off-topic questions, inappropriate language, or sensitive information leaks

Format Enforcement

Require specific response structures, prefixes, or formatting rules

Compliance

Enforce regulatory requirements, privacy policies, or business rules

Security

Prevent prompt injection, data exfiltration, or unauthorized actions

For validating tool inputs (e.g., checking field values, data types, ranges), use Pydantic validators instead. Guardrails are for agent-level validation.

Practical Examples

Example 1: Filtering Off-Topic Questions

Use input guardrails to keep agents focused on their domain. This example delegates relevance decisions to a judge agent:

from agency_swarm import Agency, Agent, GuardrailFunctionOutput, RunContextWrapper, input_guardrail
from agents.model_settings import ModelSettings
from pydantic import BaseModel

class RelevanceDecision(BaseModel):
    is_relevant: bool
    reason: str

guardrail_agent = Agent(
    name="GuardrailAgent",
    instructions=(
        "You screen incoming messages for a customer-support assistant. "
        "Treat questions about account access, billing, and troubleshooting as relevant. "
        "Flag any other unrelated requests as irrelevant."
    ),
    model="gpt-5-nano",
    model_settings=ModelSettings(reasoning_effort="minimal"),
    output_type=RelevanceDecision,
)

@input_guardrail
async def require_support_topic(
    context: RunContextWrapper, agent: Agent, user_input: str | list[str]
) -> GuardrailFunctionOutput:
    """Forward the decision to the guardrail agent."""
    candidate = user_input if isinstance(user_input, str) else "\\n".join(user_input)
    guardrail_result = await guardrail_agent.get_response(candidate, context=context.context)
    decision = RelevanceDecision.model_validate(guardrail_result.final_output)

    if not decision.is_relevant:
        return GuardrailFunctionOutput(
            output_info="Only support questions are allowed. Ask about billing, account access, or troubleshooting.",
            tripwire_triggered=True,
        )
    return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)

support_agent = Agent(
    name="CustomerSupportAgent",
    instructions="You help customers resolve account, billing, and troubleshooting issues.",
    model="gpt-5-mini",
    input_guardrails=[require_support_topic],
    throw_input_guardrail_error=False,  # Friendly mode: guidance returned as assistant message
)

See the full example at examples/guardrails_input.py.

Example 2: Preventing Sensitive Information Leaks

Use output guardrails to review responses before delivery. This example prevents agents from sharing email addresses:

from agency_swarm import Agency, Agent, GuardrailFunctionOutput, RunContextWrapper, output_guardrail

@output_guardrail(name="ForbidSensitiveEmail")
async def forbid_sensitive_email(
    context: RunContextWrapper, agent: Agent, response_text: str
) -> GuardrailFunctionOutput:
    """Reject responses that include personal email addresses."""
    if "@" in response_text:
        return GuardrailFunctionOutput(
            output_info="Do not share email addresses. Offer to connect via the support portal instead.",
            tripwire_triggered=True,
        )
    return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)

support_agent = Agent(
    name="SupportPilot",
    instructions="You handle customer support. Official email: [email protected].",
    model="gpt-5",
    output_guardrails=[forbid_sensitive_email],
    validation_attempts=1,  # Agent gets 1 retry to fix the response
)

See the full example at examples/guardrails_output.py.

Example 3: Simple Format Enforcement

Require responses to follow a specific format:

@output_guardrail(name="RequireJSONFormat")
async def require_json_format(
    context: RunContextWrapper, agent: Agent, response_text: str
) -> GuardrailFunctionOutput:
    """Ensure responses are valid JSON."""
    import json
    try:
        json.loads(response_text)
        return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)
    except json.JSONDecodeError:
        return GuardrailFunctionOutput(
            output_info="Response must be valid JSON. Wrap your response in curly braces.",
            tripwire_triggered=True,
        )

Output Guardrails

Output guardrails validate agent responses before they reach users or other agents. When a guardrail trips, the agent receives feedback and retries.

Function Signature

Each output guardrail receives three parameters:

@output_guardrail
async def my_output_guardrail(
    context: RunContextWrapper,
    agent: Agent,
    response_text: str | Type[BaseModel]
) -> GuardrailFunctionOutput:
    """Validate agent output."""
    # Your validation logic here
    pass

Parameters:

context: Run context wrapper with access to shared state
agent: The Agent instance generating the response
response_text: The agent’s response as a string, or a Pydantic model if output_type is specified

Return:

GuardrailFunctionOutput with:
- tripwire_triggered (bool): True if validation failed
- output_info (str): Feedback message sent to the agent when tripwire_triggered=True

Basic Output Guardrail

from agency_swarm import output_guardrail, GuardrailFunctionOutput, RunContextWrapper, Agent

@output_guardrail
async def response_content_guardrail(
    context: RunContextWrapper, agent: Agent, response_text: str
) -> GuardrailFunctionOutput:
    """Reject responses containing inappropriate content."""
    tripwire_triggered = False
    output_info = ""

    if "bad word" in response_text.lower():
        tripwire_triggered = True
        output_info = "Please avoid using inappropriate language."

    return GuardrailFunctionOutput(
        output_info=output_info,
        tripwire_triggered=tripwire_triggered,
    )

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    output_guardrails=[response_content_guardrail],
)

Output Guardrail Retry Flow

When an output guardrail trips, the agent gets multiple chances to fix its response. The validation_attempts parameter controls this behavior.

How Retry Works

Agent generates response

The agent produces its initial response

Output guardrail checks response

Each output guardrail validates the response

If validation fails

The agent receives a system message containing the output_info from the guardrail

Agent retries

The agent generates a new response, informed by the error message

Repeat until success or limit reached

This cycle continues up to validation_attempts times

If all attempts fail

OutputGuardrailTripwireTriggered exception is raised

Configuring Retry Attempts

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    output_guardrails=[response_content_guardrail],
    validation_attempts=2,  # Default is 1 (one retry)
)

Settings:

validation_attempts=0: Fail-fast (no retries, immediate exception)
validation_attempts=1: Default (one retry after initial failure)
validation_attempts=2+: Multiple retries for complex validations

Each retry sends the output_info message to the agent as a system message, giving the agent context to adjust its response.

Handling Validation Failures

After all validation attempts fail, handle the exception:

from agency_swarm import OutputGuardrailTripwireTriggered

try:
    response = await agency.get_response("Hello!")
except OutputGuardrailTripwireTriggered as e:
    print(f"Validation failed: {e.guardrail_result.output_info}")
    # Implement fallback behavior or notify user

Input Guardrails

Input guardrails validate incoming messages before they reach the agent. They screen both user input and inter-agent communication.

Simplified Input Processing

Agency Swarm automatically extracts text content from messages, so your guardrails receive clean text instead of complex message structures. You don’t need manual extraction logic.

Function Signature

Each input guardrail receives three parameters:

@input_guardrail
async def my_input_guardrail(
    context: RunContextWrapper,
    agent: Agent,
    user_input: str | list[str]
) -> GuardrailFunctionOutput:
    """Validate user input."""
    # Your validation logic here
    pass

Parameters:

context: Run context wrapper with access to shared state
agent: The Agent instance receiving the input
user_input: Extracted text content
- Single message: A string containing the message content
- Multiple consecutive messages: A list of strings, one per message

Return:

GuardrailFunctionOutput with:
- tripwire_triggered (bool): True if validation failed
- output_info (str): Guidance message returned to the caller

File and image inputs inside messages are not passed to the guardrail.

Input Types

When a user sends multiple messages:

[
  {"role": "user", "content": "Hi"},
  {"role": "user", "content": "How are you?"}
]

Your guardrail receives:

["Hi", "How are you?"]

This allows you to process each new input message individually or validate them as a group.

Basic Input Guardrail

from agency_swarm import input_guardrail, GuardrailFunctionOutput, RunContextWrapper, Agent

@input_guardrail
async def require_task_prefix(
    context: RunContextWrapper, agent: Agent, user_input: str | list[str]
) -> GuardrailFunctionOutput:
    """Require user requests to begin with 'Request:'"""

    # Handle single string input
    text = user_input if isinstance(user_input, str) else " ".join(user_input)
    condition = not text.startswith("Request:")

    return GuardrailFunctionOutput(
        output_info="Prefix your request with 'Request:' describing what you need." if condition else "",
        tripwire_triggered=condition,
    )

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    input_guardrails=[require_task_prefix],
)

Friendly vs Strict Mode

Input guardrails support two modes that control how guardrail guidance is delivered: friendly mode (default) and strict mode. The throw_input_guardrail_error parameter controls this behavior.

Friendly Mode (Default)

Setting: throw_input_guardrail_error=False In friendly mode, guardrail guidance flows naturally as if it came from the agent itself:

Guidance returned as final_output (non-streaming) or message_output_created event (streaming)
No exceptions raised
Persisted as an assistant message (message_origin="input_guardrail_message")
User experience stays fluid and conversational

When to use:

Conversational flows where you want to guide users naturally
Internal agents communicating with each other
Cases where you want to provide helpful feedback without interrupting the flow

Example:

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    input_guardrails=[require_task_prefix],
    throw_input_guardrail_error=False,  # Friendly mode (default)
)

# Usage
response = await agency.get_response("Hello!")
print(response.final_output)
# Output: "Prefix your request with 'Request:' describing what you need."
# No exception raised - guidance returned directly

Streaming behavior:

RunItemStreamEvent(
    name='message_output_created',
    item=MessageOutputItem(
        raw_item=ResponseOutputMessage(
            id='msg_input_guardrail_guidance',
            content=[ResponseOutputText(text="Prefix your request...")],
            role='assistant',
            status='completed'
        )
    )
)

Strict Mode

Setting: throw_input_guardrail_error=True In strict mode, guardrail failures abort the turn immediately:

InputGuardrailTripwireTriggered exception raised
Persisted as a system message (message_origin="input_guardrail_error")
Turn aborted (agent never processes the input)
Caller must handle the exception

When to use:

Hard requirements or compliance rules that cannot be bypassed
Security validations that must block processing
Cases where you want explicit exception handling

Example:

from agency_swarm import InputGuardrailTripwireTriggered

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    input_guardrails=[require_task_prefix],
    throw_input_guardrail_error=True,  # Strict mode
)

# Usage
try:
    response = await agency.get_response("Hello!")
except InputGuardrailTripwireTriggered as e:
    print(f"Validation failed: {e.guardrail_result.output_info}")
    # Output: "Validation failed: Prefix your request with 'Request:' describing what you need."

Comparison Table

Mode	`throw_input_guardrail_error`	Caller sees	Persisted entry	Role	Use case
Friendly	`False` (default)	Guardrail text as `final_output` or streaming event	Assistant message (`input_guardrail_message`)	`assistant`	Conversational flows, helpful guidance
Strict	`True`	`InputGuardrailTripwireTriggered` exception	System message (`input_guardrail_error`)	`system`	Hard requirements, compliance, security

Decision Guide

Should I use friendly or strict mode?

Use Friendly Mode when:

You want a conversational user experience
Agents are communicating with each other internally
Guardrail feedback is helpful guidance, not a hard block
You don’t want to write exception handling code

Use Strict Mode when:

You’re enforcing non-negotiable requirements
Security or compliance rules must block processing
You want explicit control over error handling
The caller should know immediately that validation failed

Guardrails in Message History

Each guardrail trigger is recorded in the chat history with a guidance entry. Every entry carries a message_origin field that identifies which guardrail fired.

Message Origin Values

input_guardrail_message: Input guardrail in friendly mode
input_guardrail_error: Input guardrail in strict mode
output_guardrail_error: Output guardrail (always system message)

Persistence Behavior

Mode	`throw_input_guardrail_error`	Streaming Event	Persisted Entry
Friendly	`False` (default)	`message_output_created` with guidance text	Assistant message, `message_origin="input_guardrail_message"`
Strict	`True`	`{"type": "error", "content": guidance}`	System message, `message_origin="input_guardrail_error"`

Each triggered guardrail leaves exactly one guidance entry in the chat history:

In friendly mode, that entry is an assistant message and its text matches what the caller receives
In strict mode, the guardrail raises an exception and only the system guidance entry remains

The validation_attempts parameter currently does not apply to input guardrails - they trigger immediately on validation failure.

Message History After Guardrails Trip

When an input guardrail trips, agent-to-agent messages (requests from calling agents) remain in history alongside the guardrail guidance. This preserves context so calling agents understand what they asked and can adjust their approach. Output guardrail messages also persist in history to guide retry attempts.

Example message history entries

[
    // Input guardrail triggered by user input in friendly mode (presented as assistant message)
    {
        "role": "assistant",
        "content": "Please, prefix your request with 'Support:' describing what you need.",
        "message_origin": "input_guardrail_message",
        "agent": "CustomerSupportAgent",
        "callerAgent": null,
        "agent_run_id": "agent_run_id",
        "timestamp": 1758103764049935,
        "type": "message",
    },

    // Input guardrail triggered within the agency in friendly mode (guidance returned inline)
    {
        "role": "assistant",
        "content": "When chatting with this agent, provide your name (which is Alice), for example, 'Hello, I'm Alice.' Adjust your input and try again.",
        "message_origin": "input_guardrail_message",
        "agent": "DatabaseAgent",
        "callerAgent": "CustomerSupportAgent",
        "agent_run_id": "agent_run_id",
        "parent_run_id": "call_id",
        "timestamp": 1758103766899061,
        "type": "message",
    },

    // Output guardrail triggered by an assistant response
    {
        "role": "system",
        "content": "You are not allowed to include your email address in your response. Ask agent to redirect user to the contact page: https://www.example.com/contact",
        "message_origin": "output_guardrail_error",
        "agent": "DatabaseAgent",
        "callerAgent": "CustomerSupportAgent",
        "agent_run_id": "agent_run_id",
        "parent_run_id": "call_id",
        "timestamp": 1758103770629217,
        "type": "message",
    },
]

Agent-to-Agent Validation

Use guardrails to control how agents communicate with each other. When adding communication flows between agents, the recipient agent’s guardrails define the message format.

Input and Output Guardrails for Inter-Agent Communication

@input_guardrail(name="RequireTaskPrefix")
async def require_task_prefix(
    context: RunContextWrapper, agent: Agent, agent_input: str | list[str]
) -> GuardrailFunctionOutput:
    text = agent_input if isinstance(agent_input, str) else " ".join(agent_input)
    condition = not text.startswith("Task:")
    return GuardrailFunctionOutput(
        output_info="ERROR: Requests to this agent must begin with 'Task:'" if condition else "",
        tripwire_triggered=condition,
    )


@output_guardrail(name="RequireResponsePrefix")
async def require_response_prefix(
    context: RunContextWrapper, agent: Agent, response_text: str
) -> GuardrailFunctionOutput:
    condition = not response_text.startswith("Response:")
    return GuardrailFunctionOutput(
        output_info="ERROR: Your response must start with 'Response:'" if condition else "",
        tripwire_triggered=condition,
    )

ceo = Agent(
    name="CEO",
    instructions="You are the CEO agent.",
)

worker = Agent(
    name="Worker",
    instructions="You are the worker agent.",
    input_guardrails=[require_task_prefix],
    output_guardrails=[require_response_prefix],
    throw_input_guardrail_error=True,
)

agency = Agency(
    ceo,
    communication_flows=[(ceo, worker)],
)

In this example:

If the CEO agent sends a message to the worker that doesn’t start with “Task:”, the input guardrail triggers
The CEO receives an error message: "ERROR: Requests to this agent must begin with 'Task:'"
The CEO adjusts its message and tries again (or notifies the user, per its instructions)

Similarly, the worker’s output guardrail ensures responses start with “Response:”. Within the configured validation_attempts, the worker must generate a correct response or the validation fails.

Agent-to-agent messages are always single strings, so input guardrails for inter-agent communication always receive a string (not a list).

Recommended Mode for Internal Agents

It is recommended to use friendly mode (throw_input_guardrail_error=False) for the agency’s internal agents. While strict mode (True) is also supported, friendly mode ensures that guardrail guidance flows naturally through the agent chain without raising exceptions that interrupt the communication flow.

Due to the nature of Handoffs, using SendMessageHandoff for agent-to-agent communication will bypass input guardrails set between agents.

Best Practices

Single Responsibility

Each guardrail should check one thing. Create multiple guardrails for different concerns instead of combining logic.

Specific Error Messages

Provide clear, actionable feedback in output_info. Tell the agent or user exactly what to fix.

Use Judge Agents

For complex decisions (like relevance or tone), delegate to a specialized judge agent instead of hard-coding rules.

Test Independently

Test guardrails with various inputs to ensure they catch invalid cases and allow valid ones.

Balance UX vs Enforcement

Consider the user experience - friendly mode for guidance, strict mode for hard blocks.

Start Simple

Begin with basic checks and expand as needed. Overly complex guardrails can slow response time.

Tool Input Validation

Validate tool inputs using Pydantic validators

Agent Configuration

Advanced agent configuration options

Agents SDK Guardrails

Underlying guardrail implementation in OpenAI Agents SDK

Streaming

Handle streaming events and responses

Examples

View complete guardrail examples on GitHub

Welcome

Core Framework

Additional Features

References

Contributing

Migration

FAQ

​When to Use Guardrails

Content Filtering

Format Enforcement

Compliance

Security

​Practical Examples

​Example 1: Filtering Off-Topic Questions

​Example 2: Preventing Sensitive Information Leaks

​Example 3: Simple Format Enforcement

​Output Guardrails

​Function Signature

​Basic Output Guardrail

​Output Guardrail Retry Flow

​How Retry Works

​Configuring Retry Attempts

​Handling Validation Failures

​Input Guardrails

​Simplified Input Processing

​Function Signature

​Input Types

​Basic Input Guardrail

​Friendly vs Strict Mode

​Friendly Mode (Default)

​Strict Mode

​Comparison Table

​Decision Guide

​Guardrails in Message History

​Message Origin Values

​Persistence Behavior

​Message History After Guardrails Trip

​Agent-to-Agent Validation

​Input and Output Guardrails for Inter-Agent Communication

​Recommended Mode for Internal Agents

​Best Practices

Single Responsibility

Specific Error Messages

Use Judge Agents

Test Independently

Balance UX vs Enforcement

Start Simple

​See Also

Tool Input Validation

Agent Configuration

Agents SDK Guardrails

Streaming

Examples

When to Use Guardrails

Practical Examples

Example 1: Filtering Off-Topic Questions

Example 2: Preventing Sensitive Information Leaks

Example 3: Simple Format Enforcement

Output Guardrails

Function Signature

Basic Output Guardrail

Output Guardrail Retry Flow

How Retry Works

Configuring Retry Attempts

Handling Validation Failures

Input Guardrails

Simplified Input Processing

Function Signature

Input Types

Basic Input Guardrail

Friendly vs Strict Mode

Friendly Mode (Default)

Strict Mode

Comparison Table

Decision Guide

Guardrails in Message History

Message Origin Values

Persistence Behavior

Message History After Guardrails Trip

Agent-to-Agent Validation

Input and Output Guardrails for Inter-Agent Communication

Recommended Mode for Internal Agents

Best Practices

See Also