Skip to main content
Recommended: Use the Starter Template for production. It ships with FastAPI endpoints, auth, and a clean project layout.

Required Environment Variables

Before deploying, ensure these are set in your production environment:
VariableRequiredDescription
OPENAI_API_KEYYesYour OpenAI API key
APP_TOKENRecommendedAuthentication token for FastAPI endpoints
Thread persistence uses callbacks you define to store threads in any database you choose.
This guide assumes you have already created an agency. If you haven’t, check out the Getting Started guide.
Before deploying, ensure you have thoroughly tested all tools and agents. Run the test cases in each tool file and verify the agency works end-to-end using demo methods.

Deployment Process

Step 1: Persist Conversation Threads

By default, every time you create a new Agency(), it starts a fresh conversation thread. In production, you usually need to resume prior conversations or handle multiple users.
Persist the full conversation history for each chat, including user-facing turns and agent-to-agent handoffs.
Chat persistence is handled through callback functions passed to the Agency constructor:
from agents import TResponseInputItem
from agency_swarm import Agency


def save_threads(messages: list[TResponseInputItem], chat_id: str) -> None:
    save_threads_to_db(chat_id, messages)

def load_threads(chat_id: str) -> list[TResponseInputItem]:
    return load_threads_from_db(chat_id)

agency = Agency(
    agent1,
    agent2,
    communication_flows=[(agent1, agent2)],
    load_threads_callback=lambda: load_threads(chat_id),
    save_threads_callback=lambda messages: save_threads(messages, chat_id),
)
If you switch model providers for an existing saved chat, old tool/event items may no longer replay correctly. Start a new chat, or keep only {role, content} messages.

Step 2: Configure FastAPI Endpoints

Use FastAPI in one of two ways:
  • Single agency: call agency.run_fastapi(...) from an Agency instance.
  • Multiple agencies and/or standalone tools: use top-level run_fastapi(agencies=..., tools=[...]).
There can be multiple agencies in one server, and each agency key becomes its own endpoint prefix.
from agency_swarm import Agency, Agent, function_tool, run_fastapi

@function_tool
def health_check() -> str:
    return "ok"

def create_support_agency(load_threads_callback=None):
    support = Agent(name="Support", instructions="You are a support agent.")
    return Agency(
        support,
        name="support",
        load_threads_callback=load_threads_callback,
    )

def create_sales_agency(load_threads_callback=None):
    sales = Agent(name="Sales", instructions="You are a sales agent.")
    return Agency(
        sales,
        name="sales",
        load_threads_callback=load_threads_callback,
    )

run_fastapi(
    agencies={
        "support": create_support_agency,
        "sales": create_sales_agency,
    },
    tools=[health_check],
    app_token_env="APP_TOKEN",
    cors_origins=["https://your-app.example"],
)
run_fastapi(agencies=...) injects load_threads_callback per request (for chat_history) and does not inject save_threads_callback. If you need server-side persistence writes, wire that explicitly in your application flow.
This creates separate agency endpoints plus tool endpoints, for example:
  • /support/get_response and /support/get_response_stream
  • /sales/get_response and /sales/get_response_stream
  • /tool/health_check
FastAPI details:If you need tools hosted separately from your agency service, expose tools as APIs and connect them with OpenAPI schemas, or use MCP Integration.

Step 3: Deploy the Service

Use the Starter Template as your production base. It already includes FastAPI wiring and deployment defaults.
  • Create a repo from the template
  • Set OPENAI_API_KEY and APP_TOKEN
  • Follow the template README to deploy
If you are wiring your own server, see FastAPI Integration for endpoint and parameter details (host, port, app_token_env, cors_origins, enable_agui).