Redaction
The Quality module ships a pluggable redaction layer that scrubs PII and secrets out of every framework boundary that could leak — log lines, trace attributes, memory writes, captured LLM prompts, agent events. Redaction is always-on when the decorator is wired in; opt-out is per-tenant per-sink, never the default.
The redactor SPI is Redactor in com.tnsai.quality.redaction. The default implementation PatternRedactor ships with a 14-pattern catalog covering email, phone, credit card, SSN, TC Kimlik, API keys, JWT, AWS keys, IBAN, IP, bearer tokens, and SSH private-key blocks. Operators wire the redactor into framework sinks via decorator types (RedactingMemoryStore, RedactingAgentEventPublisher).
Quick start
import com.tnsai.quality.redaction.*;
// 1. Build a redactor with the default pattern catalog.
Redactor redactor = PatternRedactor.withDefaults();
// 2. Wrap the agent's memory store and event publisher.
MemoryStore memory = new RedactingMemoryStore(new InMemoryStore(), redactor);
AgentEventPublisher events = new RedactingAgentEventPublisher(new Slf4jAgentEventPublisher(), redactor);
// 3. From this point on, every write through these sinks is scrubbed.
agent.setMemoryStore(memory);
agent.setEventPublisher(events);Pattern catalog
RedactionPatterns.defaults() returns the patterns below. Each is exported as a constant on RedactionPatterns (e.g. RedactionPatterns.EMAIL) so you can subset or augment.
| Pattern | Severity | Example |
|---|---|---|
email | MEDIUM | user@example.com |
phone_e164 | MEDIUM | +15551234567 |
credit_card | CRITICAL | 4111-1111-1111-1111 (Luhn-validated) |
us_ssn | CRITICAL | 123-45-6789 |
tc_kimlik | CRITICAL | 11-digit Turkish national ID (checksum-validated) |
openai_api_key | HIGH | sk-... |
anthropic_api_key | HIGH | sk-ant-... |
github_pat | HIGH | ghp_... |
aws_access_key | HIGH | AKIA... |
jwt | HIGH | three base64url segments separated by . |
iban | MEDIUM | DE89370400440532013000 |
ipv4 | LOW | 10.0.0.1 |
bearer_token | HIGH | Bearer <token> in headers |
ssh_private_key_block | CRITICAL | -----BEGIN ... PRIVATE KEY----- blocks |
Numeric patterns (credit card, TC Kimlik) include checksum validators so order IDs that look like credit-card numbers don't get scrubbed. The placeholder shape is [REDACTED:<pattern_name>] and is itself inert against every default pattern — a placeholder cannot be re-redacted on a later pass.
Where redaction fires
| Sink | Decorator | What gets scrubbed |
|---|---|---|
| Conversation memory writes | RedactingMemoryStore | addMessage(role, content), addMessage(Map), search(query, …) |
| Agent event log | RedactingAgentEventPublisher | All 13 TnsAIEvent variants — every user-input-bearing field rebuilt with redacted values |
| Search queries against memory | RedactingMemoryStore | search(query, limit) — query scrubbed before backend dispatch |
Reads (getHistory, getRecentHistory) pass through unchanged because content stored via these decorators is already redacted at write time; double-scrubbing wastes cycles.
Composition
Every redactor type below implements the same Redactor SPI, so they slot into each other freely.
Multiple redactor pipeline
CompositeRedactor chains redactors in order; the output text of one feeds the next. Useful for stacking a fast pattern matcher with a slower LLM-classifier or Presidio bridge.
Redactor pipeline = CompositeRedactor.of(
PatternRedactor.withDefaults(), // fast, regex-based
new LLMClassifierRedactor(haiku, "pii-policy")); // slower, NER-based (when shipped)Order matters: the first redactor's [REDACTED:...] placeholders pass through every later redactor unchanged, so a downstream classifier doesn't get to "explain" what was already scrubbed.
Per-tenant policies
TenantPolicyRedactor dispatches each call to a per-tenant redactor based on the active RedactionContext. Resolution order:
RedactionContext.tenantPolicyId()— explicit overrideEventContext.tenantId()from the framework's correlation context- Configured default redactor — fallback for unmatched / blank ids
Redactor everything = PatternRedactor.withDefaults();
Redactor emailOnly = new PatternRedactor(List.of(RedactionPatterns.EMAIL));
Redactor pciOnly = new PatternRedactor(List.of(RedactionPatterns.CREDIT_CARD,
RedactionPatterns.US_SSN));
Redactor dispatcher = TenantPolicyRedactor.builder()
.tenant("acme", emailOnly) // acme: minimal redaction
.tenant("globex", pciOnly) // globex: PCI-only
.defaultRedactor(everything) // everyone else: full catalog
.build();The dispatch happens on every call (no per-thread cache) so a single agent group serving multiple tenants doesn't leak across tenant boundaries.
Audit trail
AuditingRedactor decorates any redactor and emits a RedactionAuditEvent to a RedactionAuditListener whenever findings exist. Aggregates pattern counts + highest severity + framework correlation context — the redacted content itself never appears in the audit event.
Redactor base = PatternRedactor.withDefaults();
Redactor audited = new AuditingRedactor(base, event -> {
metrics.counter("redaction.applied",
"scope", event.scope().name(),
"severity", event.highestSeverity().name())
.increment(event.totalFindings());
if (event.isCritical()) {
slack.alert("CRITICAL pii redaction in tenant " + event.eventContext().tenantId());
}
});AuditingRedactor IS a Redactor, so it stacks anywhere — inside TenantPolicyRedactor for per-tenant audit, or wrapping the dispatcher itself for cross-tenant audit.
A broken audit listener cannot break redaction itself: exceptions thrown inside onRedaction(...) are caught and logged at WARN.
Custom patterns
Build your own RedactionPattern and pass it to a PatternRedactor:
import java.util.regex.Pattern;
import com.tnsai.quality.redaction.*;
RedactionPattern medicalRecordId = RedactionPattern.of(
"medical_record_id",
"MRN-\\d{8}", // pattern (compiled to Pattern internally)
Severity.CRITICAL);
RedactionPattern licensePlate = RedactionPattern.of(
"license_plate_tr",
"\\b\\d{2}\\s?[A-Z]{1,3}\\s?\\d{2,4}\\b",
Severity.MEDIUM,
plate -> plate.length() >= 7); // optional validator filters false positives
List<RedactionPattern> myCatalog = new java.util.ArrayList<>(RedactionPatterns.defaults());
myCatalog.add(medicalRecordId);
myCatalog.add(licensePlate);
Redactor custom = new PatternRedactor(myCatalog);The validator runs after the regex matches and lets you check shape rules (Luhn checksum, range, character class) that regex alone can't express. Patterns without a validator accept every regex match.
SPI summary
| Type | Role |
|---|---|
Redactor | The SPI. Three methods: scrubString, scrubValue, maybeContainsSensitive. |
PatternRedactor | Default implementation — regex-driven, stateless, thread-safe. |
RedactionPatterns | Constants for the 14 default patterns + defaults() list. |
RedactionResult | Output of scrubString — text + findings. |
RedactionFinding | One match — pattern name, offsets, placeholder, severity. |
RedactionContext | Per-call context — EventContext, scope, tenant policy id. |
RedactionScope | Enum of sinks — LOG_ATTR, MEMORY_WRITE, LLM_PROMPT, etc. |
Severity | LOW / MEDIUM / HIGH / CRITICAL. |
CompositeRedactor | Chain of redactors. |
TenantPolicyRedactor | Per-tenant dispatcher. |
AuditingRedactor | Decorator that fires RedactionAuditEvents. |
RedactingMemoryStore | MemoryStore decorator. |
RedactingAgentEventPublisher | AgentEventPublisher decorator. |
Invariants
The framework's property test suite (RedactorPropertyTest) verifies four invariants that every Redactor implementation must hold:
- Idempotency —
redact(redact(x)).text == redact(x).text. A sink that double-scrubs (memory write then log emit) doesn't drift. - Pattern survival — no input PII fragment appears in the redacted output.
- Placeholder safety — the literal placeholder
[REDACTED:foo]does not match any default pattern. - Length bounded — output length ≤ input length + (max placeholder length × findings count).
Custom redactors that ship into framework sinks should add coverage to the same suite.
Trade-offs
- False positives — phone-regex matches tracking IDs that look like phones. Default placeholder preserves visual shape; tighten per-pattern with a validator.
- False negatives — rare PII formats (driver's licence, medical record IDs) are not in the default catalog. Plug in custom patterns or the (forthcoming) classifier redactor.
- Performance cost — on hot paths with huge payloads (1MB+ LLM responses), redaction adds measurable latency. Big-payload paths should run through async / buffered publish.
- Data loss risk — aggressive redaction may scrub legitimate content (base64 images, opaque IDs that look like tokens). Tune per tenant.
Roadmap
The redaction SPI lives in tnsai-quality and is stable. Pending work tracked under issue #80:
LLMClassifierRedactor— opt-in classifier using anyLLMClientfor non-pattern PII (names, addresses, health info).PresidioRedactor— bridge to Microsoft Presidio for state-of-the-art NER.- YAML policy loader —
tenant-redaction-policies.yamlparser to populateTenantPolicyRedactorfrom config. OnNotificationhook integration — fire on critical-severity findings (depends on the hook system in #60).