Skip to main content
Guardrails validate agent inputs and outputs using the Agents SDK. They act as checkpoints that screen incoming messages before processing and review agent responses before delivery, ensuring agents stay on-topic, avoid sensitive content, and follow format requirements.

When to Use Guardrails

Guardrails solve validation challenges that go beyond simple field checks:

Content Filtering

Block off-topic questions, inappropriate language, or sensitive information leaks

Format Enforcement

Require specific response structures, prefixes, or formatting rules

Compliance

Enforce regulatory requirements, privacy policies, or business rules

Security

Prevent prompt injection, data exfiltration, or unauthorized actions
For validating tool inputs (e.g., checking field values, data types, ranges), use Pydantic validators instead. Guardrails are for agent-level validation.

Practical Examples

Example 1: Filtering Off-Topic Questions

Use input guardrails to keep agents focused on their domain. This example delegates relevance decisions to a judge agent:
from agency_swarm import Agency, Agent, GuardrailFunctionOutput, RunContextWrapper, input_guardrail
from agents.model_settings import ModelSettings
from pydantic import BaseModel

class RelevanceDecision(BaseModel):
    is_relevant: bool
    reason: str

guardrail_agent = Agent(
    name="GuardrailAgent",
    instructions=(
        "You screen incoming messages for a customer-support assistant. "
        "Treat questions about account access, billing, and troubleshooting as relevant. "
        "Flag any other unrelated requests as irrelevant."
    ),
    model="gpt-5-nano",
    model_settings=ModelSettings(reasoning_effort="minimal"),
    output_type=RelevanceDecision,
)

@input_guardrail
async def require_support_topic(
    context: RunContextWrapper, agent: Agent, user_input: str | list[str]
) -> GuardrailFunctionOutput:
    """Forward the decision to the guardrail agent."""
    candidate = user_input if isinstance(user_input, str) else "\\n".join(user_input)
    guardrail_result = await guardrail_agent.get_response(candidate, context=context.context)
    decision = RelevanceDecision.model_validate(guardrail_result.final_output)

    if not decision.is_relevant:
        return GuardrailFunctionOutput(
            output_info="Only support questions are allowed. Ask about billing, account access, or troubleshooting.",
            tripwire_triggered=True,
        )
    return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)

support_agent = Agent(
    name="CustomerSupportAgent",
    instructions="You help customers resolve account, billing, and troubleshooting issues.",
    model="gpt-5-mini",
    input_guardrails=[require_support_topic],
    throw_input_guardrail_error=False,  # Friendly mode: guidance returned as assistant message
)
See the full example at examples/guardrails_input.py.

Example 2: Preventing Sensitive Information Leaks

Use output guardrails to review responses before delivery. This example prevents agents from sharing email addresses:
from agency_swarm import Agency, Agent, GuardrailFunctionOutput, RunContextWrapper, output_guardrail

@output_guardrail(name="ForbidSensitiveEmail")
async def forbid_sensitive_email(
    context: RunContextWrapper, agent: Agent, response_text: str
) -> GuardrailFunctionOutput:
    """Reject responses that include personal email addresses."""
    if "@" in response_text:
        return GuardrailFunctionOutput(
            output_info="Do not share email addresses. Offer to connect via the support portal instead.",
            tripwire_triggered=True,
        )
    return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)

support_agent = Agent(
    name="SupportPilot",
    instructions="You handle customer support. Official email: support@example.com.",
    model="gpt-5",
    output_guardrails=[forbid_sensitive_email],
    validation_attempts=1,  # Agent gets 1 retry to fix the response
)
See the full example at examples/guardrails_output.py.

Example 3: Simple Format Enforcement

Require responses to follow a specific format:
@output_guardrail(name="RequireJSONFormat")
async def require_json_format(
    context: RunContextWrapper, agent: Agent, response_text: str
) -> GuardrailFunctionOutput:
    """Ensure responses are valid JSON."""
    import json
    try:
        json.loads(response_text)
        return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)
    except json.JSONDecodeError:
        return GuardrailFunctionOutput(
            output_info="Response must be valid JSON. Wrap your response in curly braces.",
            tripwire_triggered=True,
        )

Output Guardrails

Output guardrails validate agent responses before they reach users or other agents. When a guardrail trips, the agent receives feedback and retries.

Function Signature

Each output guardrail receives three parameters:
@output_guardrail
async def my_output_guardrail(
    context: RunContextWrapper,
    agent: Agent,
    response_text: str | Type[BaseModel]
) -> GuardrailFunctionOutput:
    """Validate agent output."""
    # Your validation logic here
    pass
Parameters:
  • context: Run context wrapper with access to shared state
  • agent: The Agent instance generating the response
  • response_text: The agent’s response as a string, or a Pydantic model if output_type is specified
Return:
  • GuardrailFunctionOutput with:
    • tripwire_triggered (bool): True if validation failed
    • output_info (str): Feedback message sent to the agent when tripwire_triggered=True

Basic Output Guardrail

from agency_swarm import output_guardrail, GuardrailFunctionOutput, RunContextWrapper, Agent

@output_guardrail
async def response_content_guardrail(
    context: RunContextWrapper, agent: Agent, response_text: str
) -> GuardrailFunctionOutput:
    """Reject responses containing inappropriate content."""
    tripwire_triggered = False
    output_info = ""

    if "bad word" in response_text.lower():
        tripwire_triggered = True
        output_info = "Please avoid using inappropriate language."

    return GuardrailFunctionOutput(
        output_info=output_info,
        tripwire_triggered=tripwire_triggered,
    )

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    output_guardrails=[response_content_guardrail],
)

Output Guardrail Retry Flow

When an output guardrail trips, the agent gets multiple chances to fix its response. The validation_attempts parameter controls this behavior.

How Retry Works

1

Agent generates response

The agent produces its initial response
2

Output guardrail checks response

Each output guardrail validates the response
3

If validation fails

The agent receives a system message containing the output_info from the guardrail
4

Agent retries

The agent generates a new response, informed by the error message
5

Repeat until success or limit reached

This cycle continues up to validation_attempts times
6

If all attempts fail

OutputGuardrailTripwireTriggered exception is raised

Configuring Retry Attempts

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    output_guardrails=[response_content_guardrail],
    validation_attempts=2,  # Default is 1 (one retry)
)
Settings:
  • validation_attempts=0: Fail-fast (no retries, immediate exception)
  • validation_attempts=1: Default (one retry after initial failure)
  • validation_attempts=2+: Multiple retries for complex validations
Each retry sends the output_info message to the agent as a system message, giving the agent context to adjust its response.

Handling Validation Failures

After all validation attempts fail, handle the exception:
from agency_swarm import OutputGuardrailTripwireTriggered

try:
    response = await agency.get_response("Hello!")
except OutputGuardrailTripwireTriggered as e:
    print(f"Validation failed: {e.guardrail_result.output_info}")
    # Implement fallback behavior or notify user

Input Guardrails

Input guardrails validate incoming messages before they reach the agent. They screen both user input and inter-agent communication.

Simplified Input Processing

Agency Swarm automatically extracts text content from messages, so your guardrails receive clean text instead of complex message structures. You don’t need manual extraction logic.

Function Signature

Each input guardrail receives three parameters:
@input_guardrail
async def my_input_guardrail(
    context: RunContextWrapper,
    agent: Agent,
    user_input: str | list[str]
) -> GuardrailFunctionOutput:
    """Validate user input."""
    # Your validation logic here
    pass
Parameters:
  • context: Run context wrapper with access to shared state
  • agent: The Agent instance receiving the input
  • user_input: Extracted text content
    • Single message: A string containing the message content
    • Multiple consecutive messages: A list of strings, one per message
Return:
  • GuardrailFunctionOutput with:
    • tripwire_triggered (bool): True if validation failed
    • output_info (str): Guidance message returned to the caller
File and image inputs inside messages are not passed to the guardrail.

Input Types

When a user sends multiple messages:
[
  {"role": "user", "content": "Hi"},
  {"role": "user", "content": "How are you?"}
]
Your guardrail receives:
["Hi", "How are you?"]
This allows you to process each new input message individually or validate them as a group.

Basic Input Guardrail

from agency_swarm import input_guardrail, GuardrailFunctionOutput, RunContextWrapper, Agent

@input_guardrail
async def require_task_prefix(
    context: RunContextWrapper, agent: Agent, user_input: str | list[str]
) -> GuardrailFunctionOutput:
    """Require user requests to begin with 'Request:'"""

    # Handle single string input
    text = user_input if isinstance(user_input, str) else " ".join(user_input)
    condition = not text.startswith("Request:")

    return GuardrailFunctionOutput(
        output_info="Prefix your request with 'Request:' describing what you need." if condition else "",
        tripwire_triggered=condition,
    )

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    input_guardrails=[require_task_prefix],
)

Friendly vs Strict Mode

Input guardrails support two modes that control how guardrail guidance is delivered: friendly mode (default) and strict mode. The throw_input_guardrail_error parameter controls this behavior.

Friendly Mode (Default)

Setting: throw_input_guardrail_error=False In friendly mode, guardrail guidance flows naturally as if it came from the agent itself:
  • Guidance returned as final_output (non-streaming) or message_output_created event (streaming)
  • No exceptions raised
  • Persisted as an assistant message (message_origin="input_guardrail_message")
  • User experience stays fluid and conversational
When to use:
  • Conversational flows where you want to guide users naturally
  • Internal agents communicating with each other
  • Cases where you want to provide helpful feedback without interrupting the flow
Example:
agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    input_guardrails=[require_task_prefix],
    throw_input_guardrail_error=False,  # Friendly mode (default)
)

# Usage
response = await agency.get_response("Hello!")
print(response.final_output)
# Output: "Prefix your request with 'Request:' describing what you need."
# No exception raised - guidance returned directly
Streaming behavior:
RunItemStreamEvent(
    name='message_output_created',
    item=MessageOutputItem(
        raw_item=ResponseOutputMessage(
            id='msg_input_guardrail_guidance',
            content=[ResponseOutputText(text="Prefix your request...")],
            role='assistant',
            status='completed'
        )
    )
)

Strict Mode

Setting: throw_input_guardrail_error=True In strict mode, guardrail failures abort the turn immediately:
  • InputGuardrailTripwireTriggered exception raised
  • Persisted as a system message (message_origin="input_guardrail_error")
  • Turn aborted (agent never processes the input)
  • Caller must handle the exception
When to use:
  • Hard requirements or compliance rules that cannot be bypassed
  • Security validations that must block processing
  • Cases where you want explicit exception handling
Example:
from agency_swarm import InputGuardrailTripwireTriggered

agent = Agent(
    name="CustomerSupportAgent",
    instructions="You are a helpful customer support agent.",
    input_guardrails=[require_task_prefix],
    throw_input_guardrail_error=True,  # Strict mode
)

# Usage
try:
    response = await agency.get_response("Hello!")
except InputGuardrailTripwireTriggered as e:
    print(f"Validation failed: {e.guardrail_result.output_info}")
    # Output: "Validation failed: Prefix your request with 'Request:' describing what you need."

Comparison Table

Modethrow_input_guardrail_errorCaller seesPersisted entryRoleUse case
FriendlyFalse (default)Guardrail text as final_output or streaming eventAssistant message (input_guardrail_message)assistantConversational flows, helpful guidance
StrictTrueInputGuardrailTripwireTriggered exceptionSystem message (input_guardrail_error)systemHard requirements, compliance, security

Decision Guide

Use Friendly Mode when:
  • You want a conversational user experience
  • Agents are communicating with each other internally
  • Guardrail feedback is helpful guidance, not a hard block
  • You don’t want to write exception handling code
Use Strict Mode when:
  • You’re enforcing non-negotiable requirements
  • Security or compliance rules must block processing
  • You want explicit control over error handling
  • The caller should know immediately that validation failed

Guardrails in Message History

Each guardrail trigger is recorded in the chat history with a guidance entry. Every entry carries a message_origin field that identifies which guardrail fired.

Message Origin Values

  • input_guardrail_message: Input guardrail in friendly mode
  • input_guardrail_error: Input guardrail in strict mode
  • output_guardrail_error: Output guardrail (always system message)

Persistence Behavior

Modethrow_input_guardrail_errorStreaming EventPersisted Entry
FriendlyFalse (default)message_output_created with guidance textAssistant message, message_origin="input_guardrail_message"
StrictTrue{"type": "error", "content": guidance}System message, message_origin="input_guardrail_error"
Each triggered guardrail leaves exactly one guidance entry in the chat history:
  • In friendly mode, that entry is an assistant message and its text matches what the caller receives
  • In strict mode, the guardrail raises an exception and only the system guidance entry remains
The validation_attempts parameter currently does not apply to input guardrails - they trigger immediately on validation failure.

Message History After Guardrails Trip

When an input guardrail trips, agent-to-agent messages (requests from calling agents) remain in history alongside the guardrail guidance. This preserves context so calling agents understand what they asked and can adjust their approach. Output guardrail messages also persist in history to guide retry attempts.
[
    // Input guardrail triggered by user input in friendly mode (presented as assistant message)
    {
        "role": "assistant",
        "content": "Please, prefix your request with 'Support:' describing what you need.",
        "message_origin": "input_guardrail_message",
        "agent": "CustomerSupportAgent",
        "callerAgent": null,
        "agent_run_id": "agent_run_id",
        "timestamp": 1758103764049935,
        "type": "message",
    },

    // Input guardrail triggered within the agency in friendly mode (guidance returned inline)
    {
        "role": "assistant",
        "content": "When chatting with this agent, provide your name (which is Alice), for example, 'Hello, I'm Alice.' Adjust your input and try again.",
        "message_origin": "input_guardrail_message",
        "agent": "DatabaseAgent",
        "callerAgent": "CustomerSupportAgent",
        "agent_run_id": "agent_run_id",
        "parent_run_id": "call_id",
        "timestamp": 1758103766899061,
        "type": "message",
    },

    // Output guardrail triggered by an assistant response
    {
        "role": "system",
        "content": "You are not allowed to include your email address in your response. Ask agent to redirect user to the contact page: https://www.example.com/contact",
        "message_origin": "output_guardrail_error",
        "agent": "DatabaseAgent",
        "callerAgent": "CustomerSupportAgent",
        "agent_run_id": "agent_run_id",
        "parent_run_id": "call_id",
        "timestamp": 1758103770629217,
        "type": "message",
    },
]

Agent-to-Agent Validation

Use guardrails to control how agents communicate with each other. When adding communication flows between agents, the recipient agent’s guardrails define the message format.

Input and Output Guardrails for Inter-Agent Communication

@input_guardrail(name="RequireTaskPrefix")
async def require_task_prefix(
    context: RunContextWrapper, agent: Agent, agent_input: str | list[str]
) -> GuardrailFunctionOutput:
    text = agent_input if isinstance(agent_input, str) else " ".join(agent_input)
    condition = not text.startswith("Task:")
    return GuardrailFunctionOutput(
        output_info="ERROR: Requests to this agent must begin with 'Task:'" if condition else "",
        tripwire_triggered=condition,
    )


@output_guardrail(name="RequireResponsePrefix")
async def require_response_prefix(
    context: RunContextWrapper, agent: Agent, response_text: str
) -> GuardrailFunctionOutput:
    condition = not response_text.startswith("Response:")
    return GuardrailFunctionOutput(
        output_info="ERROR: Your response must start with 'Response:'" if condition else "",
        tripwire_triggered=condition,
    )

ceo = Agent(
    name="CEO",
    instructions="You are the CEO agent.",
)

worker = Agent(
    name="Worker",
    instructions="You are the worker agent.",
    input_guardrails=[require_task_prefix],
    output_guardrails=[require_response_prefix],
    throw_input_guardrail_error=True,
)

agency = Agency(
    ceo,
    communication_flows=[(ceo, worker)],
)
In this example:
  • If the CEO agent sends a message to the worker that doesn’t start with “Task:”, the input guardrail triggers
  • The CEO receives an error message: "ERROR: Requests to this agent must begin with 'Task:'"
  • The CEO adjusts its message and tries again (or notifies the user, per its instructions)
Similarly, the worker’s output guardrail ensures responses start with “Response:”. Within the configured validation_attempts, the worker must generate a correct response or the validation fails.
Agent-to-agent messages are always single strings, so input guardrails for inter-agent communication always receive a string (not a list).
It is recommended to use friendly mode (throw_input_guardrail_error=False) for the agency’s internal agents. While strict mode (True) is also supported, friendly mode ensures that guardrail guidance flows naturally through the agent chain without raising exceptions that interrupt the communication flow.
Due to the nature of Handoffs, using SendMessageHandoff for agent-to-agent communication will bypass input guardrails set between agents.

Best Practices

Single Responsibility

Each guardrail should check one thing. Create multiple guardrails for different concerns instead of combining logic.

Specific Error Messages

Provide clear, actionable feedback in output_info. Tell the agent or user exactly what to fix.

Use Judge Agents

For complex decisions (like relevance or tone), delegate to a specialized judge agent instead of hard-coding rules.

Test Independently

Test guardrails with various inputs to ensure they catch invalid cases and allow valid ones.

Balance UX vs Enforcement

Consider the user experience - friendly mode for guidance, strict mode for hard blocks.

Start Simple

Begin with basic checks and expand as needed. Overly complex guardrails can slow response time.

See Also