When to Use Guardrails
Guardrails solve validation challenges that go beyond simple field checks:Content Filtering
Block off-topic questions, inappropriate language, or sensitive information leaks
Format Enforcement
Require specific response structures, prefixes, or formatting rules
Compliance
Enforce regulatory requirements, privacy policies, or business rules
Security
Prevent prompt injection, data exfiltration, or unauthorized actions
For validating tool inputs (e.g., checking field values, data types, ranges), use Pydantic validators instead. Guardrails are for agent-level validation.
Practical Examples
Example 1: Filtering Off-Topic Questions
Use input guardrails to keep agents focused on their domain. This example delegates relevance decisions to a judge agent:examples/guardrails_input.py.
Example 2: Preventing Sensitive Information Leaks
Use output guardrails to review responses before delivery. This example prevents agents from sharing email addresses:examples/guardrails_output.py.
Example 3: Simple Format Enforcement
Require responses to follow a specific format:Output Guardrails
Output guardrails validate agent responses before they reach users or other agents. When a guardrail trips, the agent receives feedback and retries.Function Signature
Each output guardrail receives three parameters:context: Run context wrapper with access to shared stateagent: The Agent instance generating the responseresponse_text: The agent’s response as a string, or a Pydantic model ifoutput_typeis specified
GuardrailFunctionOutputwith:tripwire_triggered(bool):Trueif validation failedoutput_info(str): Feedback message sent to the agent whentripwire_triggered=True
Basic Output Guardrail
Output Guardrail Retry Flow
When an output guardrail trips, the agent gets multiple chances to fix its response. Thevalidation_attempts parameter controls this behavior.
How Retry Works
1
Agent generates response
The agent produces its initial response
2
Output guardrail checks response
Each output guardrail validates the response
3
If validation fails
The agent receives a system message containing the
output_info from the guardrail4
Agent retries
The agent generates a new response, informed by the error message
5
Repeat until success or limit reached
This cycle continues up to
validation_attempts times6
If all attempts fail
OutputGuardrailTripwireTriggered exception is raisedConfiguring Retry Attempts
validation_attempts=0: Fail-fast (no retries, immediate exception)validation_attempts=1: Default (one retry after initial failure)validation_attempts=2+: Multiple retries for complex validations
Each retry sends the
output_info message to the agent as a system message, giving the agent context to adjust its response.Handling Validation Failures
After all validation attempts fail, handle the exception:Input Guardrails
Input guardrails validate incoming messages before they reach the agent. They screen both user input and inter-agent communication.Simplified Input Processing
Agency Swarm automatically extracts text content from messages, so your guardrails receive clean text instead of complex message structures. You don’t need manual extraction logic.Function Signature
Each input guardrail receives three parameters:context: Run context wrapper with access to shared stateagent: The Agent instance receiving the inputuser_input: Extracted text content- Single message: A string containing the message content
- Multiple consecutive messages: A list of strings, one per message
GuardrailFunctionOutputwith:tripwire_triggered(bool):Trueif validation failedoutput_info(str): Guidance message returned to the caller
File and image inputs inside messages are not passed to the guardrail.
Input Types
When a user sends multiple messages:Basic Input Guardrail
Friendly vs Strict Mode
Input guardrails support two modes that control how guardrail guidance is delivered: friendly mode (default) and strict mode. Thethrow_input_guardrail_error parameter controls this behavior.
Friendly Mode (Default)
Setting:throw_input_guardrail_error=False
In friendly mode, guardrail guidance flows naturally as if it came from the agent itself:
- Guidance returned as
final_output(non-streaming) ormessage_output_createdevent (streaming) - No exceptions raised
- Persisted as an assistant message (
message_origin="input_guardrail_message") - User experience stays fluid and conversational
- Conversational flows where you want to guide users naturally
- Internal agents communicating with each other
- Cases where you want to provide helpful feedback without interrupting the flow
Strict Mode
Setting:throw_input_guardrail_error=True
In strict mode, guardrail failures abort the turn immediately:
InputGuardrailTripwireTriggeredexception raised- Persisted as a system message (
message_origin="input_guardrail_error") - Turn aborted (agent never processes the input)
- Caller must handle the exception
- Hard requirements or compliance rules that cannot be bypassed
- Security validations that must block processing
- Cases where you want explicit exception handling
Comparison Table
| Mode | throw_input_guardrail_error | Caller sees | Persisted entry | Role | Use case |
|---|---|---|---|---|---|
| Friendly | False (default) | Guardrail text as final_output or streaming event | Assistant message (input_guardrail_message) | assistant | Conversational flows, helpful guidance |
| Strict | True | InputGuardrailTripwireTriggered exception | System message (input_guardrail_error) | system | Hard requirements, compliance, security |
Decision Guide
Should I use friendly or strict mode?
Should I use friendly or strict mode?
Use Friendly Mode when:
- You want a conversational user experience
- Agents are communicating with each other internally
- Guardrail feedback is helpful guidance, not a hard block
- You don’t want to write exception handling code
- You’re enforcing non-negotiable requirements
- Security or compliance rules must block processing
- You want explicit control over error handling
- The caller should know immediately that validation failed
Guardrails in Message History
Each guardrail trigger is recorded in the chat history with a guidance entry. Every entry carries amessage_origin field that identifies which guardrail fired.
Message Origin Values
input_guardrail_message: Input guardrail in friendly modeinput_guardrail_error: Input guardrail in strict modeoutput_guardrail_error: Output guardrail (always system message)
Persistence Behavior
| Mode | throw_input_guardrail_error | Streaming Event | Persisted Entry |
|---|---|---|---|
| Friendly | False (default) | message_output_created with guidance text | Assistant message, message_origin="input_guardrail_message" |
| Strict | True | {"type": "error", "content": guidance} | System message, message_origin="input_guardrail_error" |
- In friendly mode, that entry is an assistant message and its text matches what the caller receives
- In strict mode, the guardrail raises an exception and only the system guidance entry remains
The
validation_attempts parameter currently does not apply to input guardrails - they trigger immediately on validation failure.Message History After Guardrails Trip
When an input guardrail trips, agent-to-agent messages (requests from calling agents) remain in history alongside the guardrail guidance. This preserves context so calling agents understand what they asked and can adjust their approach. Output guardrail messages also persist in history to guide retry attempts.Example message history entries
Example message history entries
Agent-to-Agent Validation
Use guardrails to control how agents communicate with each other. When adding communication flows between agents, the recipient agent’s guardrails define the message format.Input and Output Guardrails for Inter-Agent Communication
- If the CEO agent sends a message to the worker that doesn’t start with “Task:”, the input guardrail triggers
- The CEO receives an error message:
"ERROR: Requests to this agent must begin with 'Task:'" - The CEO adjusts its message and tries again (or notifies the user, per its instructions)
validation_attempts, the worker must generate a correct response or the validation fails.
Agent-to-agent messages are always single strings, so input guardrails for inter-agent communication always receive a string (not a list).
Recommended Mode for Internal Agents
It is recommended to use friendly mode (throw_input_guardrail_error=False) for the agency’s internal agents. While strict mode (True) is also supported, friendly mode ensures that guardrail guidance flows naturally through the agent chain without raising exceptions that interrupt the communication flow.
Best Practices
Single Responsibility
Each guardrail should check one thing. Create multiple guardrails for different concerns instead of combining logic.
Specific Error Messages
Provide clear, actionable feedback in
output_info. Tell the agent or user exactly what to fix.Use Judge Agents
For complex decisions (like relevance or tone), delegate to a specialized judge agent instead of hard-coding rules.
Test Independently
Test guardrails with various inputs to ensure they catch invalid cases and allow valid ones.
Balance UX vs Enforcement
Consider the user experience - friendly mode for guidance, strict mode for hard blocks.
Start Simple
Begin with basic checks and expand as needed. Overly complex guardrails can slow response time.