Returning images and files from the tools enables real agentic feedback loops on completely new modalities.For example, instead of dumping all the data into an agent, and hoping for the best, you can generate a visualization or analyze PDF reports, and allow the agent to provide insights based on that output. Just like a real data analyst.This saves your context window and unlocks autonomous agentic workflows for a lot of new use cases:
Agents can check websites autonomously and iterate until all elements are properly positioned, enabling them to tackle complex projects without manual screenshot feedback.
Brand Asset Generation
Provide brand guidelines, logos, and messaging, then let agents iterate on image and video generation (including Sora 2) until outputs fully match your expectations.
Screen-Aware Assistance
Build agents that help visually impaired individuals navigate websites or create customer support agents that see the user’s current webpage for better assistance.
Data Analytics
Generate visual graphs and analyze PDF reports, then let agents provide insights based on these outputs without overloading the context window.
from agency_swarm import Agent, BaseTool, ToolOutputImagefrom pydantic import Fieldimport base64import matplotlib.pyplot as pltimport ioclass GenerateChartTool(BaseTool): """Generate a bar chart from data.""" data: list[float] = Field(..., description="Data points for the chart") labels: list[str] = Field(..., description="Labels for each data point") def run(self) -> ToolOutputImage: """Generate and return the chart as a base64-encoded image.""" # Create the chart fig, ax = plt.subplots() ax.bar(self.labels, self.data) # Convert to base64 buf = io.BytesIO() plt.savefig(buf, format='png') buf.seek(0) image_base64 = base64.b64encode(buf.read()).decode('utf-8') plt.close() # Return in multimodal format return ToolOutputImage(image_url=f"data:image/png;base64,{image_base64}")# Create an agent with the toolagent = Agent( name="DataViz", instructions="You generate charts and visualizations for data analysis.", tools=[GenerateChartTool])
function_tool decorators and BaseTool classes both support multimodal outputs in the exact same way.