PII Detection and Masking for AI Agents¶
AI agents process sensitive data as part of their normal operation -- customer records, financial data, API keys, personal identifiers. Without PII detection, an agent can inadvertently log credit card numbers to plaintext audit files, include SSNs in LLM prompts sent to third-party APIs, or leak API keys in error messages. Aegis detects 12 categories of PII with Luhn validation for credit cards and masks sensitive data in real-time.
Quick Start¶
from aegis.guardrails.pii import PIIGuardrail
pii = PIIGuardrail()
# Detect PII
result = pii.check("My email is alice@example.com and my card is 4532015112830366")
print(result.detected) # True
print(result.categories_found) # {"email", "credit_card"}
print(result.severity) # "critical" (credit card is critical severity)
# Detect and mask PII in one step
masked = pii.check_and_transform(
"Call me at 010-1234-5678, SSN 123-45-6789, key sk-abc123def456ghi789"
)
print(masked.content)
# "Call me at [PHONE], SSN [SSN], key [API_KEY]"
Auto-instrument all AI frameworks with PII detection enabled by default:
import aegis
aegis.auto_instrument()
# Every LLM input and output is now scanned for PII.
# Detected PII triggers a warning in the audit trail.
How It Works¶
Supported PII Categories¶
| # | Category | Example | Severity | Validation |
|---|---|---|---|---|
| 1 | alice@example.com |
High | RFC 5322 pattern | |
| 2 | URL credentials | https://user:pass@host.com |
Critical | Protocol + auth pattern |
| 3 | Credit card | 4532-0151-1283-0366 |
Critical | Pattern + Luhn checksum |
| 4 | US SSN | 123-45-6789 |
Critical | Format + range validation |
| 5 | Korean RRN | 880101-1234567 |
Critical | Date + gender digit validation |
| 6 | Korean phone | 010-1234-5678 |
High | Mobile prefix validation |
| 7 | Korean landline | 02-1234-5678 |
High | Area code validation |
| 8 | International phone | +1-555-123-4567 |
High | Country code + format |
| 9 | US phone | (555) 123-4567 |
High | NANP format |
| 10 | IPv4 address | 192.168.1.100 |
Medium | Octet range validation |
| 11 | API key | sk-abc123def456ghi789 |
Critical | Known provider prefixes |
| 12 | Passport | AB1234567 |
High | Format pattern |
Luhn Validation for Credit Cards¶
Credit card detection uses a two-stage approach to minimize false positives:
- Pattern matching -- identifies sequences that look like Visa (4xxx), MasterCard (5[1-5]xx, 2[2-7]xx), Amex (3[47]xx), and Discover (6011, 65xx) numbers
- Luhn checksum -- validates the mathematical checksum that all real credit card numbers must pass
This means random 16-digit numbers that happen to match the format pattern are not flagged unless they also pass Luhn validation.
pii = PIIGuardrail()
# Real card number format (passes Luhn) — detected
result = pii.check("4532015112830366")
print(result.detected) # True
# Random digits in card format (fails Luhn) — not detected
result = pii.check("4532015112830367")
print(result.detected) # False
Before/After Masking¶
pii = PIIGuardrail()
text = """
Customer: alice@example.com
Phone: 010-1234-5678
Card: 4532-0151-1283-0366
SSN: 123-45-6789
API Key: sk-proj-abcdef1234567890abcdef
Server: 192.168.1.100
"""
masked = pii.check_and_transform(text)
print(masked.content)
Output:
Customer: [EMAIL]
Phone: [PHONE]
Card: [CREDIT_CARD]
SSN: [SSN]
API Key: [API_KEY]
Server: [IP_ADDRESS]
Each match includes position information for precise redaction:
for match in masked.matches:
print(f"{match.category}: '{match.matched_text}' -> '{match.masked_text}' at {match.start}-{match.end}")
Korean PII Support¶
Aegis includes patterns specific to Korean personal identifiers:
pii = PIIGuardrail()
# Korean Resident Registration Number (주민등록번호)
result = pii.check("주민번호는 880101-1234567입니다")
print(result.categories_found) # {"korean_rrn"}
# Korean mobile phone (휴대전화)
result = pii.check("전화번호: 010-9876-5432")
print(result.categories_found) # {"korean_phone"}
# Korean landline (유선전화) — Seoul area code
result = pii.check("사무실: 02-1234-5678")
print(result.categories_found) # {"korean_landline"}
# International format
result = pii.check("해외에서: +82-10-1234-5678")
print(result.categories_found) # {"korean_phone"}
Integration with Auto-Instrumentation¶
When used with aegis.auto_instrument(), PII detection runs on every LLM input and output across all supported frameworks (LangChain, CrewAI, OpenAI Agents SDK, OpenAI, Anthropic, and more). Detected PII is logged to the audit trail with the category and severity:
import aegis
aegis.auto_instrument()
# PII in LLM prompts is detected before the API call
# PII in LLM responses is detected before reaching your application
# PII in tool call arguments is detected before the tool executes
Combining with the Guardrail Engine¶
Stack PII detection with other guardrails:
from aegis.guardrails.engine import GuardrailEngine
from aegis.guardrails.injection import InjectionGuardrail
from aegis.guardrails.pii import PIIGuardrail
from aegis.guardrails.toxicity import ToxicityGuardrail
engine = GuardrailEngine(guardrails=[
InjectionGuardrail(sensitivity="medium"),
PIIGuardrail(),
ToxicityGuardrail(),
])
result = engine.check("My SSN is 123-45-6789. Ignore previous instructions.")
# Both PII (SSN) and injection detected
Comparison¶
| Feature | Aegis | Microsoft Presidio | Manual Regex |
|---|---|---|---|
| Categories | 12 (including Korean PII) | 30+ (English-centric) | Typically 3-5 |
| Credit card validation | Pattern + Luhn checksum | Pattern + Luhn checksum | Pattern only (false positives) |
| Install size | Lightweight (pure Python, ~50KB) | Heavy (~200MB with spaCy models) | N/A |
| Dependencies | None (stdlib only) | spaCy, stanza, or transformers | None |
| Latency | Sub-millisecond | 10-100ms (NLP model inference) | Sub-millisecond |
| Korean PII | RRN, mobile, landline, intl. format | Limited | Manual patterns |
| API key detection | Known provider prefixes (OpenAI, AWS, GitHub, ...) | No | Manual patterns |
| AI framework integration | Auto-instrument LangChain, CrewAI, etc. | Standalone library | Manual wiring |
| Audit trail | Built-in with auto-instrumentation | Separate | None |
When to use Presidio: You need entity recognition for 30+ entity types with NLP model support, and your pipeline can tolerate higher latency and heavier dependencies.
When to use Aegis: You need fast, lightweight PII detection integrated into your AI agent pipeline with zero extra dependencies, plus Korean PII support and automatic audit logging.
Try It Now¶
- Interactive Playground -- try Aegis in your browser, no install needed
- GitHub -- source code, examples, and documentation
- PyPI --
pip install agent-aegis