Policy as Code for AI Agents: Preview, Test, and Deploy Safely¶

Changing AI agent policies is risky. Loosening a rule might allow actions that should be blocked. Tightening a rule might break production workflows. There is no way to preview the impact before deploying -- unless you treat policies as code. Aegis provides aegis plan (like terraform plan) and aegis test (like pytest for policies) so you can see exactly what would change before it goes live.

Quick Start¶

pip install agent-aegis

Preview the impact of a policy change:

aegis plan current.yaml proposed.yaml --replay audit.jsonl

Run regression tests against a policy:

aegis test policy.yaml test_suite.yaml

Auto-generate a test suite from a policy:

aegis test policy.yaml --generate --generate-output test_suite.yaml

How It Works¶

`aegis plan` -- Preview Impact Before Deploying¶

aegis plan compares two YAML policies and shows what would change. When you provide audit history (via --replay or --audit-db), it replays every recorded action against both policies to show which actions would get a different governance decision.

# Basic diff: show rule changes
aegis plan current.yaml proposed.yaml

# Replay against JSONL audit history
aegis plan current.yaml proposed.yaml --replay audit.jsonl

# Replay against SQLite audit database
aegis plan current.yaml proposed.yaml --audit-db audit.db

# Filter replay to a specific session
aegis plan current.yaml proposed.yaml --audit-db audit.db --session prod-042

# JSON output for CI/CD pipelines
aegis plan current.yaml proposed.yaml --replay audit.jsonl --format json

# CI mode: exit code 1 if any actions would be newly blocked
aegis plan current.yaml proposed.yaml --replay audit.jsonl --ci

Example output:

Policy Diff: current.yaml → proposed.yaml
═══════════════════════════════════════════

Rules Added:
  + block_bulk_operations    [approval: block, risk: critical]

Rules Removed:
  - allow_all_writes         [approval: auto, risk: low]

Rules Modified:
  ~ approve_updates          [approval: auto → approve, risk: low → medium]

Replay Impact (247 historical actions):
═══════════════════════════════════════════

  Action                  Old Decision    New Decision    Impact
  ─────────────────────   ────────────    ────────────    ──────
  bulk_delete@prod_db     auto            block           BREAKING
  update_user@crm         auto            approve         STRICTER
  read_report@analytics   auto            auto            no change
  write_config@staging    approve         approve         no change

Summary:
  12 actions newly blocked (BREAKING)
   8 actions now require approval (stricter)
 227 actions unchanged

`aegis test` -- Regression Testing for Policies¶

Define test cases that verify your policy evaluates actions correctly. Run them in CI to catch regressions when policies change.

# test_suite.yaml
version: "1"

cases:
  - name: reads_are_auto_allowed
    action:
      type: read_report
      target: analytics
    expect:
      approval: auto
      risk_level: low

  - name: deletes_are_blocked
    action:
      type: delete_user
      target: production_db
    expect:
      approval: block
      risk_level: critical

  - name: writes_require_approval
    action:
      type: update_contact
      target: crm
    expect:
      approval: approve
      risk_level: medium

Run the tests:

aegis test policy.yaml test_suite.yaml

Output:

Policy Test Results: policy.yaml
═════════════════════════════════

  Test                        Expected         Actual           Result
  ────────────────────────    ────────────     ────────────     ──────
  reads_are_auto_allowed      auto/low         auto/low         PASS
  deletes_are_blocked         block/critical   block/critical   PASS
  writes_require_approval     approve/medium   approve/medium   PASS

3/3 tests passed

Auto-Generate Test Suites¶

Generate a test suite from an existing policy. Aegis creates test cases for every rule plus edge cases:

# Print to stdout
aegis test policy.yaml --generate

# Write to file
aegis test policy.yaml --generate --generate-output test_suite.yaml

This generates test cases that exercise each rule in your policy, giving you baseline coverage without writing tests manually.

Regression Detection¶

Compare test outcomes between an old and new policy to detect regressions:

aegis test proposed.yaml test_suite.yaml --regression current.yaml

This runs the test suite against both policies and flags any test case where the outcome differs.

CI/CD Integration¶

Add policy testing and impact preview to your CI pipeline:

# .github/workflows/policy-check.yml
name: Policy Check
on:
  pull_request:
    paths:
      - 'policies/**'

jobs:
  policy-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - run: pip install agent-aegis

      # Run policy tests
      - name: Test policy
        run: aegis test policies/policy.yaml policies/test_suite.yaml

      # Preview impact against audit history
      - name: Plan impact
        run: |
          aegis plan policies/current.yaml policies/proposed.yaml \
            --replay audit/production.jsonl \
            --ci
        # --ci exits with code 1 if any actions would be newly blocked

Policy Validation¶

Before testing or deploying, validate the policy syntax:

aegis validate policy.yaml

This checks: - YAML syntax - Required fields (version, defaults, rules) - Valid risk levels and approval modes - Glob pattern syntax - Condition operator validity

Programmatic API¶

Use the plan and replay APIs from Python:

from aegis.core.diff import diff_policies, analyze_impact
from aegis.core.policy import Policy
from aegis.core.replay import ReplayEngine, load_events_from_jsonl

# Load policies
old = Policy.from_yaml("current.yaml")
new = Policy.from_yaml("proposed.yaml")

# Diff rules
diff = diff_policies(old, new)
print(f"Rules added: {len(diff.added)}")
print(f"Rules removed: {len(diff.removed)}")
print(f"Rules modified: {len(diff.modified)}")

# Replay against audit history
events = load_events_from_jsonl("audit.jsonl")
engine = ReplayEngine(old)
report = engine.what_if(events, new)

print(f"Total events: {report.total}")
print(f"Changed decisions: {report.changed}")
print(f"Newly blocked: {report.newly_blocked}")

Comparison¶

Feature	Aegis	OPA (Open Policy Agent)	Manual Review
Domain	AI agent governance	Infrastructure/API authorization	Any
Policy language	YAML (declarative, simple)	Rego (general-purpose, steep learning curve)	N/A
Impact preview	`aegis plan` with audit replay	No built-in equivalent	Spreadsheet analysis
Regression testing	`aegis test` with YAML test suites	`opa test` with Rego tests	Manual QA
Auto-generate tests	`aegis test --generate`	No	No
AI agent integration	Built-in (LangChain, CrewAI, OpenAI, ...)	Custom integration	N/A
Audit trail	Built-in (SQLite + JSONL)	External (decision logs)	None
CI mode	`--ci` flag (exit code on breaking changes)	Via custom scripting	PR review
Learning curve	YAML rules, glob patterns	Rego language	N/A

When to use OPA: You need a general-purpose policy engine for infrastructure authorization (Kubernetes, Terraform, API gateways) with Rego's full expressiveness.

When to use Aegis: You need policy-as-code specifically for AI agent governance, with built-in impact preview, audit replay, and regression testing that understands AI agent actions.

Try It Now¶

Interactive Playground -- try Aegis in your browser, no install needed
GitHub -- source code, examples, and documentation
PyPI -- pip install agent-aegis