AI Agent Cost Governance¶
Autonomous AI agents can burn through API budgets in minutes. A single runaway loop — an agent retrying a failed tool call, or two agents delegating back and forth — can generate thousands of LLM calls before anyone notices. Aegis enforces cost limits at five dimensions (per-call, per-session, daily, per-minute token rate, and total budget) with automatic loop detection, blocking or warning before spend exceeds thresholds.
Quick Start¶
Policy-Based Cost Limits¶
Define cost limits in your aegis.yaml policy file:
cost:
budget_usd: 100.0 # Total budget ceiling
per_call_limit_usd: 2.0 # Max cost per single LLM call
per_session_limit_usd: 10.0 # Max cost per agent session
daily_budget_usd: 50.0 # Max daily spend (resets at midnight UTC)
per_minute_tokens: 100000 # Token rate limit (rolling 60s window)
alert_threshold: 0.8 # Alert at 80% budget utilization
on_exceed: block # block | warn | log
Programmatic Cost Enforcement¶
from aegis.config import CostConfig
from aegis.core.cost_policy import CostPolicyEnforcer
from aegis.core.budget import TokenUsage
enforcer = CostPolicyEnforcer(CostConfig(
budget_usd=100.0,
per_session_limit_usd=5.0,
daily_budget_usd=50.0,
per_minute_tokens=100_000,
on_exceed="block",
))
# Check before every LLM call
usage = TokenUsage(model="gpt-4o", input_tokens=5000, output_tokens=1000)
decision = enforcer.check_and_record(usage)
if decision.blocked:
print(f"Blocked: {decision.reason}")
# "Session spend $5.12 would exceed per-session limit $5.00"
else:
print(f"Allowed. Cost: ${decision.cost_usd:.4f}, Total: ${decision.cumulative_usd:.4f}")
Auto-Instrument with Cost Tracking¶
import aegis
aegis.auto_instrument(cost=CostConfig(
budget_usd=50.0,
per_call_limit_usd=1.0,
on_exceed="block",
))
# Every LLM call across LangChain, OpenAI, Anthropic, CrewAI
# is now cost-tracked and budget-enforced automatically.
Cost Dimensions¶
Aegis enforces five independent cost dimensions. Each is optional — omit any limit you don't need.
| Dimension | Config Key | What It Prevents |
|---|---|---|
| Per-call | per_call_limit_usd |
A single expensive prompt (e.g., 128K context window) |
| Per-session | per_session_limit_usd |
Runaway agent session burning through budget |
| Daily | daily_budget_usd |
Daily spend cap, resets at midnight UTC |
| Token rate | per_minute_tokens |
Token-per-minute rate limit (rolling 60s window) |
| Total budget | budget_usd |
Hard ceiling across all sessions |
When a limit is hit, the on_exceed policy determines the response:
block— The call is rejected before it reaches the LLM. Cost is not recorded.warn— The call proceeds, but a warning is logged and returned in the decision.log— Same as warn. The call proceeds with a log entry.
Multi-Agent Cost Attribution¶
When multiple agents collaborate (orchestrator delegates to workers), you need to know which agent spent what. Aegis provides a CostAttributionTree that tracks costs across delegation chains with per-agent budgets that roll up to the parent.
from aegis.core.cost_attribution import CostAttributionTree
from aegis.core.budget import TokenUsage
tree = CostAttributionTree(max_budget=50.0, session_id="run-42")
# Register agent hierarchy
tree.register_agent("orchestrator", max_budget=50.0)
tree.register_agent("researcher", parent_id="orchestrator", max_budget=20.0)
tree.register_agent("writer", parent_id="orchestrator", max_budget=15.0)
tree.register_agent("reviewer", parent_id="orchestrator", max_budget=10.0)
# Record costs as agents make LLM calls
tree.record("researcher", TokenUsage(model="gpt-4o", input_tokens=8000, output_tokens=2000))
tree.record("writer", TokenUsage(model="claude-sonnet-4", input_tokens=5000, output_tokens=3000))
tree.record("reviewer", TokenUsage(model="gpt-4o-mini", input_tokens=3000, output_tokens=500))
# Get attribution report
print(tree.format_report())
# Multi-Agent Cost Attribution
# ========================================
# Global: $0.1152 spent
# Budget: $50.00 (0% used)
#
# Agent Direct Delegated Total Calls
# ------------------------------------------------------------
# orchestrator $ 0.0000 $ 0.1152 $ 0.1152 0
# researcher $ 0.0400 $ 0.0000 $ 0.0400 1
# writer $ 0.0600 $ 0.0000 $ 0.0600 1
# reviewer $ 0.0005 $ 0.0000 $ 0.0005 1
Child agent budgets are automatically capped to the parent's remaining budget. When a child exhausts its budget, only that agent is blocked — the orchestrator and siblings continue operating.
Framework Cost Callbacks¶
Aegis provides plug-and-play cost extractors for popular AI frameworks. Each extracts token usage from the framework's native response objects and records it in a shared CostTracker.
LangChain¶
from aegis.core.budget import CostTracker
from aegis.core.cost_callbacks import LangChainCostCallback
tracker = CostTracker(max_budget=10.0)
callback = LangChainCostCallback(tracker, agent_id="research-agent")
# Use as a LangChain callback handler
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", callbacks=[callback])
response = llm.invoke("Summarize this document...")
print(f"Spent: ${tracker.spent:.4f}, Remaining: ${tracker.remaining:.4f}")
OpenAI¶
from aegis.core.cost_callbacks import OpenAICostExtractor
from aegis.core.budget import CostTracker
tracker = CostTracker(max_budget=5.0)
extractor = OpenAICostExtractor(tracker)
import openai
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
extractor.record(response) # Extracts usage, records cost
Anthropic¶
from aegis.core.cost_callbacks import AnthropicCostExtractor
from aegis.core.budget import CostTracker
tracker = CostTracker(max_budget=5.0)
extractor = AnthropicCostExtractor(tracker)
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
extractor.record(message) # Extracts usage, records cost
Google Generative AI / ADK¶
from aegis.core.cost_callbacks import GoogleCostExtractor
from aegis.core.budget import CostTracker
tracker = CostTracker(max_budget=5.0)
extractor = GoogleCostExtractor(tracker, default_model="gemini-2.5-flash")
# Works with google.generativeai and Google ADK responses
extractor.record(response, model="gemini-2.5-pro")
Built-In Model Pricing¶
Aegis ships with a pricing table covering major model families (updated March 2026):
| Model Family | Input ($/M tokens) | Output ($/M tokens) | Cached ($/M tokens) |
|---|---|---|---|
| GPT-4o | 2.50 | 10.00 | 1.25 |
| GPT-4o Mini | 0.15 | 0.60 | 0.075 |
| GPT-4.1 | 2.00 | 8.00 | 0.50 |
| o3 | 10.00 | 40.00 | 2.50 |
| Claude Opus 4 | 15.00 | 75.00 | 1.50 |
| Claude Sonnet 4 | 3.00 | 15.00 | 0.30 |
| Gemini 2.5 Pro | 1.25 | 10.00 | 0.3125 |
| Gemini 2.5 Flash | 0.15 | 0.60 | 0.0375 |
Uses longest-prefix matching — claude-sonnet-4-20250514 automatically matches claude-sonnet-4 pricing. Register custom models:
from aegis.core.budget import ModelPricing
pricing = ModelPricing()
pricing.register(
"my-fine-tuned-model",
input_per_million=5.0,
output_per_million=15.0,
cached_per_million=2.5,
)
Loop Detection¶
The CostTracker includes automatic loop detection. When the same agent makes identical calls more than 5 times within 60 seconds, it raises BudgetExhausted — catching runaway retry loops before they drain your budget.
from aegis.core.budget import CostTracker, TokenUsage, BudgetExhausted
tracker = CostTracker(max_budget=10.0)
# Simulate a runaway loop
for i in range(10):
try:
tracker.record(
TokenUsage(model="gpt-4o", input_tokens=1000, output_tokens=200),
agent_id="stuck-agent",
action_type="retry_tool_call",
)
except BudgetExhausted:
print(f"Loop detected at call {i + 1}") # Loop detected at call 5
break
The detection window and threshold are configurable:
tracker = CostTracker(
max_budget=10.0,
loop_window=120.0, # seconds (default: 60)
loop_threshold=10, # calls (default: 5)
)
Cost Reports¶
Generate structured cost reports for monitoring and compliance:
report = enforcer.get_report()
# {
# "session_id": "run-42",
# "max_budget": 100.0,
# "spent": 12.3456,
# "remaining": 87.6544,
# "utilization": 0.1235,
# "call_count": 47,
# "by_model": {"gpt-4o": 8.2100, "claude-sonnet-4": 4.1356},
# "by_agent": {"researcher": 5.0200, "writer": 7.3256},
# "daily_spent": 12.3456,
# "tokens_last_minute": 45230,
# "limits": {
# "budget_usd": 100.0,
# "per_call_limit_usd": 2.0,
# "per_session_limit_usd": 10.0,
# "daily_budget_usd": 50.0,
# "per_minute_tokens": 100000,
# "on_exceed": "block",
# }
# }
Comparison with Alternatives¶
| Feature | Aegis | DIY | Guardrails AI | NeMo Guardrails |
|---|---|---|---|---|
| Multi-dimensional limits | Per-call, session, daily, rate, total | Manual tracking | N/A | N/A |
| Multi-agent attribution | Hierarchical tree with rollup | Custom code | N/A | N/A |
| Loop detection | Automatic (5 calls / 60s) | Manual | N/A | N/A |
| Built-in pricing table | 15+ models, prefix matching | Manual | N/A | N/A |
| Framework callbacks | LangChain, OpenAI, Anthropic, Google | Custom per-framework | N/A | N/A |
| Thread-safe | Lock-guarded | Depends on impl | N/A | N/A |
| Zero dependencies | Pure Python | Depends on impl | Multiple | Multiple |
Next Steps¶
- Quick Start — Get Aegis running in 5 minutes
- Policy Patterns — Combine cost limits with security policies
- Audit Trail — Log every cost decision for compliance
- EU AI Act Compliance — Cost governance meets regulatory requirements